From 0b7467e00fbd3ffa90cd6fa997227353adba54c5 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 14:14:07 -0400 Subject: [PATCH 01/58] docs: standardize SDK guides format and merge reasoning documentation - Standardized all 30 SDK guide files with consistent format: - Added components with GitHub example links - Used auto-sync code blocks with 'icon' and 'expandable' attributes - Added 'Running the Example' bash code blocks - Included brief explanations with line highlights - Added 'Next Steps' sections with related links - Merged anthropic-thinking.mdx and responses-reasoning.mdx into model-reasoning.mdx: - Created unified guide covering both Anthropic thinking blocks and OpenAI responses reasoning - Single example demonstrating both approaches - Deleted old separate documentation files - Updated docs.json navigation: - Organized all SDK guides into logical categories: * Getting Started (hello-world, custom-tools, mcp) * Agent Configuration (llm-registry, llm-routing, model-reasoning) * Conversation Management (persistence, pause-and-resume, confirmation-mode, etc.) * Agent Capabilities (activate-skill, async, planning-agent-workflow, etc.) * Agent Behavior (stuc * Agent Behavior (stuc * Agent Behavior (stuc * Ag n * Agent Behavior (stuc * Agent Behavnces in sdk/arch/sdk/llm.mdx Co-authored-by: openhands --- docs.json | 50 +- sdk/arch/agent_server/overview.mdx | 433 ++++++++++++++++ sdk/arch/overview.mdx | 142 +++++ sdk/arch/sdk/agent.mdx | 301 +++++++++++ sdk/arch/sdk/condenser.mdx | 166 ++++++ sdk/arch/sdk/conversation.mdx | 487 ++++++++++++++++++ sdk/arch/sdk/event.mdx | 403 +++++++++++++++ sdk/arch/sdk/llm.mdx | 416 +++++++++++++++ sdk/arch/sdk/mcp.mdx | 333 ++++++++++++ sdk/arch/sdk/microagents.mdx | 225 ++++++++ sdk/arch/sdk/security.mdx | 416 +++++++++++++++ sdk/arch/sdk/tool.mdx | 199 +++++++ sdk/arch/sdk/workspace.mdx | 322 ++++++++++++ sdk/arch/tools/bash.mdx | 288 +++++++++++ sdk/arch/tools/browser_use.mdx | 101 ++++ sdk/arch/tools/file_editor.mdx | 338 ++++++++++++ sdk/arch/tools/glob.mdx | 89 ++++ sdk/arch/tools/grep.mdx | 140 +++++ sdk/arch/tools/overview.mdx | 185 +++++++ sdk/arch/tools/planning_file_editor.mdx | 128 +++++ sdk/arch/tools/task_tracker.mdx | 146 ++++++ sdk/arch/workspace/docker.mdx | 330 ++++++++++++ sdk/arch/workspace/overview.mdx | 99 ++++ sdk/arch/workspace/remote_api.mdx | 325 ++++++++++++ sdk/guides/activate-skill.mdx | 178 +++++++ sdk/guides/async.mdx | 149 ++++++ sdk/guides/browser-use.mdx | 117 +++++ sdk/guides/confirmation-mode.mdx | 193 +++++++ sdk/guides/context-condenser.mdx | 175 +++++++ sdk/guides/conversation-costs.mdx | 155 ++++++ sdk/guides/custom-secrets.mdx | 104 ++++ sdk/guides/github-workflows/pr-review.mdx | 65 +++ .../github-workflows/routine-maintenance.mdx | 74 +++ sdk/guides/image-input.mdx | 138 +++++ sdk/guides/interactive-terminal.mdx | 126 +++++ sdk/guides/llm-metrics.mdx | 131 +++++ sdk/guides/llm-registry.mdx | 142 +++++ sdk/guides/llm-routing.mdx | 136 +++++ sdk/guides/model-reasoning.mdx | 261 ++++++++++ sdk/guides/pause-and-resume.mdx | 121 +++++ sdk/guides/persistence.mdx | 158 ++++++ sdk/guides/planning-agent-workflow.mdx | 172 +++++++ .../api-sandboxed-server.mdx | 42 ++ .../browser-with-docker.mdx | 44 ++ .../docker-sandboxed-server.mdx | 184 +++++++ .../local-agent-server.mdx | 91 ++++ .../vscode-with-docker.mdx | 44 ++ sdk/guides/security-analyzer.mdx | 174 +++++++ sdk/guides/send-message-while-processing.mdx | 184 +++++++ sdk/guides/stuck-detector.mdx | 101 ++++ 50 files changed, 9518 insertions(+), 3 deletions(-) create mode 100644 sdk/arch/agent_server/overview.mdx create mode 100644 sdk/arch/overview.mdx create mode 100644 sdk/arch/sdk/agent.mdx create mode 100644 sdk/arch/sdk/condenser.mdx create mode 100644 sdk/arch/sdk/conversation.mdx create mode 100644 sdk/arch/sdk/event.mdx create mode 100644 sdk/arch/sdk/llm.mdx create mode 100644 sdk/arch/sdk/mcp.mdx create mode 100644 sdk/arch/sdk/microagents.mdx create mode 100644 sdk/arch/sdk/security.mdx create mode 100644 sdk/arch/sdk/tool.mdx create mode 100644 sdk/arch/sdk/workspace.mdx create mode 100644 sdk/arch/tools/bash.mdx create mode 100644 sdk/arch/tools/browser_use.mdx create mode 100644 sdk/arch/tools/file_editor.mdx create mode 100644 sdk/arch/tools/glob.mdx create mode 100644 sdk/arch/tools/grep.mdx create mode 100644 sdk/arch/tools/overview.mdx create mode 100644 sdk/arch/tools/planning_file_editor.mdx create mode 100644 sdk/arch/tools/task_tracker.mdx create mode 100644 sdk/arch/workspace/docker.mdx create mode 100644 sdk/arch/workspace/overview.mdx create mode 100644 sdk/arch/workspace/remote_api.mdx create mode 100644 sdk/guides/activate-skill.mdx create mode 100644 sdk/guides/async.mdx create mode 100644 sdk/guides/browser-use.mdx create mode 100644 sdk/guides/confirmation-mode.mdx create mode 100644 sdk/guides/context-condenser.mdx create mode 100644 sdk/guides/conversation-costs.mdx create mode 100644 sdk/guides/custom-secrets.mdx create mode 100644 sdk/guides/github-workflows/pr-review.mdx create mode 100644 sdk/guides/github-workflows/routine-maintenance.mdx create mode 100644 sdk/guides/image-input.mdx create mode 100644 sdk/guides/interactive-terminal.mdx create mode 100644 sdk/guides/llm-metrics.mdx create mode 100644 sdk/guides/llm-registry.mdx create mode 100644 sdk/guides/llm-routing.mdx create mode 100644 sdk/guides/model-reasoning.mdx create mode 100644 sdk/guides/pause-and-resume.mdx create mode 100644 sdk/guides/persistence.mdx create mode 100644 sdk/guides/planning-agent-workflow.mdx create mode 100644 sdk/guides/remote-agent-server/api-sandboxed-server.mdx create mode 100644 sdk/guides/remote-agent-server/browser-with-docker.mdx create mode 100644 sdk/guides/remote-agent-server/docker-sandboxed-server.mdx create mode 100644 sdk/guides/remote-agent-server/local-agent-server.mdx create mode 100644 sdk/guides/remote-agent-server/vscode-with-docker.mdx create mode 100644 sdk/guides/security-analyzer.mdx create mode 100644 sdk/guides/send-message-while-processing.mdx create mode 100644 sdk/guides/stuck-detector.mdx diff --git a/docs.json b/docs.json index dd4215a0..2355244e 100644 --- a/docs.json +++ b/docs.json @@ -181,9 +181,53 @@ { "group": "Guides", "pages": [ - "sdk/guides/hello-world", - "sdk/guides/custom-tools", - "sdk/guides/mcp", + { + "group": "Getting Started", + "pages": [ + "sdk/guides/hello-world", + "sdk/guides/custom-tools", + "sdk/guides/mcp" + ] + }, + { + "group": "Agent Configuration", + "pages": [ + "sdk/guides/llm-registry", + "sdk/guides/llm-routing", + "sdk/guides/model-reasoning" + ] + }, + { + "group": "Conversation Management", + "pages": [ + "sdk/guides/persistence", + "sdk/guides/pause-and-resume", + "sdk/guides/confirmation-mode", + "sdk/guides/send-message-while-processing", + "sdk/guides/conversation-costs", + "sdk/guides/llm-metrics", + "sdk/guides/context-condenser" + ] + }, + { + "group": "Agent Capabilities", + "pages": [ + "sdk/guides/activate-skill", + "sdk/guides/async", + "sdk/guides/planning-agent-workflow", + "sdk/guides/browser-use", + "sdk/guides/image-input", + "sdk/guides/custom-secrets", + "sdk/guides/security-analyzer" + ] + }, + { + "group": "Agent Behavior", + "pages": [ + "sdk/guides/stuck-detector", + "sdk/guides/interactive-terminal" + ] + }, { "group": "Remote Agent Server", "pages": [ diff --git a/sdk/arch/agent_server/overview.mdx b/sdk/arch/agent_server/overview.mdx new file mode 100644 index 00000000..593cb646 --- /dev/null +++ b/sdk/arch/agent_server/overview.mdx @@ -0,0 +1,433 @@ +--- +title: Agent Server +description: HTTP server for remote agent execution with Docker-based sandboxing and API access. +--- + +The Agent Server provides HTTP API endpoints for remote agent execution. It enables centralized agent management, multi-user support, and production deployments. + +**Source**: [`openhands/agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server) + +## Purpose + +The Agent Server enables: +- **Remote Execution**: Run agents on dedicated servers +- **Multi-User Support**: Isolate execution per user +- **Resource Management**: Centralized resource allocation +- **API Access**: HTTP API for agent operations +- **Production Deployment**: Scalable agent infrastructure + +## Architecture + +```mermaid +graph TD + Client[Client SDK] -->|HTTPS| Server[Agent Server] + Server --> Router[FastAPI Router] + + Router --> Workspace[Workspace API] + Router --> Health[Health Check] + + Workspace --> Docker[Docker Manager] + Docker --> Container1[Container 1] + Docker --> Container2[Container 2] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Router fill:#e8f5e8 + style Docker fill:#f3e5f5 +``` + +## Quick Start + +### Using Pre-built Docker Image + +```bash +# Pull latest image +docker pull ghcr.io/all-hands-ai/agent-server:latest + +# Run server +docker run -d \ + -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +### Using Python + +```bash +# Install agent-server package +pip install openhands-agent-server + +# Start server +openhands-agent-server +``` + +## Building Docker Images + +**Source**: [`openhands/agent_server/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server/docker) + +### Build Script + +```bash +# Build from source +python -m openhands.agent_server.docker.build \ + --base-image ubuntu:22.04 \ + --target runtime \ + --platform linux/amd64 +``` + +### Build Options + +| Option | Description | Default | +|--------|-------------|---------| +| `--base-image` | Base Docker image | `ubuntu:22.04` | +| `--target` | Build target (`runtime` or `dev`) | `runtime` | +| `--platform` | Target platform | Host platform | +| `--output-image` | Output image name | Auto-generated | + +### Programmatic Build + +```python +from openhands.agent_server.docker.build import ( + BuildOptions, + build +) + +# Build custom image +image_name = build( + BuildOptions( + base_image="python:3.12", + target="runtime", + platform="linux/amd64" + ) +) + +print(f"Built image: {image_name}") +``` + +## Docker Images + +### Official Images + +```bash +# Latest release +ghcr.io/all-hands-ai/agent-server:latest + +# Specific version +ghcr.io/all-hands-ai/agent-server:v1.0.0 + +# Development build +ghcr.io/all-hands-ai/agent-server:dev +``` + +### Image Variants + +- **`runtime`**: Production-ready, minimal size +- **`dev`**: Development tools included + +## API Endpoints + +### Health Check + +```bash +GET /api/health +``` + +Returns server health status. + +### Execute Command + +```bash +POST /api/workspace/command +Content-Type: application/json +Authorization: Bearer + +{ + "command": "python script.py", + "working_dir": "/workspace", + "timeout": 30.0 +} +``` + +### File Upload + +```bash +POST /api/workspace/upload +Authorization: Bearer +Content-Type: multipart/form-data + +# Form data with file +``` + +### File Download + +```bash +GET /api/workspace/download?path=/workspace/output.txt +Authorization: Bearer +``` + +## Configuration + +### Environment Variables + +```bash +# Server configuration +export HOST=0.0.0.0 +export PORT=8000 +export API_KEY=your-secret-key + +# Docker configuration +export DOCKER_HOST=unix:///var/run/docker.sock + +# Logging +export LOG_LEVEL=INFO +export DEBUG=false +``` + +### Server Settings + +```python +# config.py +class Settings: + host: str = "0.0.0.0" + port: int = 8000 + api_key: str = "your-secret-key" + workers: int = 4 + timeout: float = 300.0 +``` + +## Deployment + +### Docker Compose + +```yaml +# docker-compose.yml +version: '3.8' + +services: + agent-server: + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - "8000:8000" + volumes: + - /var/run/docker.sock:/var/run/docker.sock + environment: + - API_KEY=your-secret-key + - LOG_LEVEL=INFO + restart: unless-stopped +``` + +### Kubernetes + +```yaml +# deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + selector: + matchLabels: + app: agent-server + template: + metadata: + labels: + app: agent-server + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 + env: + - name: API_KEY + valueFrom: + secretKeyRef: + name: agent-server-secrets + key: api-key +``` + +### Systemd Service + +```ini +# /etc/systemd/system/agent-server.service +[Unit] +Description=OpenHands Agent Server +After=docker.service +Requires=docker.service + +[Service] +Type=simple +ExecStart=/usr/bin/docker run \ + --rm \ + -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest + +Restart=always +RestartSec=10 + +[Install] +WantedBy=multi-user.target +``` + +## Security + +### Authentication + +```python +# API key authentication +from fastapi import Header, HTTPException + +async def verify_api_key(authorization: str = Header(None)): + if not authorization or not authorization.startswith("Bearer "): + raise HTTPException(status_code=401) + + api_key = authorization.split(" ")[1] + if api_key != expected_api_key: + raise HTTPException(status_code=403) +``` + +### Container Isolation + +- Each request executes in separate Docker container +- Containers have resource limits +- Network isolation between containers +- Automatic cleanup after execution + +### Rate Limiting + +```python +# Implement rate limiting per API key +from slowapi import Limiter + +limiter = Limiter(key_func=lambda: request.headers.get("Authorization")) + +@app.post("/api/workspace/command") +@limiter.limit("100/minute") +async def execute_command(...): + ... +``` + +## Monitoring + +### Health Checks + +```bash +# Check if server is running +curl http://localhost:8000/api/health + +# Response: +# {"status": "healthy", "version": "1.0.0"} +``` + +### Logging + +```python +# Structured logging +import logging + +logger = logging.getLogger("agent_server") +logger.info("Request received", extra={ + "user_id": user_id, + "command": command, + "duration": duration +}) +``` + +### Metrics + +Track important metrics: +- Request rate and latency +- Container creation/cleanup time +- Resource usage per container +- Error rates and types + +## Troubleshooting + +### Server Won't Start + +```bash +# Check port availability +netstat -tuln | grep 8000 + +# Check Docker socket +docker ps + +# Check logs +docker logs agent-server +``` + +### Container Creation Fails + +```bash +# Verify Docker permissions +docker run hello-world + +# Check Docker socket mount +ls -la /var/run/docker.sock + +# Check available resources +docker stats +``` + +### Performance Issues + +```bash +# Check resource usage +docker stats + +# Increase worker count +export WORKERS=8 + +# Optimize container startup +# Use pre-built images +# Reduce image size +``` + +## Best Practices + +1. **Use Pre-built Images**: Faster startup, consistent environment +2. **Set Resource Limits**: Prevent resource exhaustion +3. **Enable Monitoring**: Track performance and errors +4. **Implement Rate Limiting**: Prevent abuse +5. **Secure API Keys**: Use strong, rotated keys +6. **Use HTTPS**: Encrypt data in transit +7. **Regular Updates**: Keep images updated +8. **Backup Configuration**: Version control configurations + +## Development + +### Running Locally + +```bash +# Clone repository +git clone https://github.com/All-Hands-AI/agent-sdk.git +cd agent-sdk + +# Install dependencies +pip install -e ".[server]" + +# Run development server +uvicorn openhands.agent_server.main:app --reload +``` + +### Testing + +```bash +# Run tests +pytest openhands/agent_server/tests/ + +# Test specific endpoint +curl -X POST http://localhost:8000/api/workspace/command \ + -H "Authorization: Bearer test-key" \ + -H "Content-Type: application/json" \ + -d '{"command": "echo test", "working_dir": "/workspace"}' +``` + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based local execution +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Client for agent server +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Server usage examples +- **[FastAPI Documentation](https://fastapi.tiangolo.com/)** - Web framework used diff --git a/sdk/arch/overview.mdx b/sdk/arch/overview.mdx new file mode 100644 index 00000000..6662ba9b --- /dev/null +++ b/sdk/arch/overview.mdx @@ -0,0 +1,142 @@ +--- +title: Overview +description: A modular framework for building AI agents, organized into four packages for clarity and extensibility. +--- + +The OpenHands Agent SDK is organized into four packages, each serving a distinct purpose in the agent development lifecycle. + +## Package Structure + +```mermaid +graph TD + SDK[SDK Package
Core Framework] --> Tools[Tools Package
Built-in Tools] + SDK --> Workspace[Workspace Package
Execution Environments] + SDK --> AgentServer[Agent Server Package
Remote Execution] + + Tools -.->|Used by| SDK + Workspace -.->|Used by| SDK + AgentServer -.->|Hosts| SDK + + style SDK fill:#e1f5fe + style Tools fill:#e8f5e8 + style Workspace fill:#fff3e0 + style AgentServer fill:#f3e5f5 +``` + +## 1. SDK Package + +Core framework for building agents locally. + +**Key Components:** +- **[Tool System](/sdk/architecture/sdk/tool)** - Define custom capabilities +- **[Microagents](/sdk/architecture/sdk/microagents)** - Specialized behavior modules +- **[Condenser](/sdk/architecture/sdk/condenser)** - Memory management +- **[Agent](/sdk/architecture/sdk/agent)** - Base agent interface +- **[Workspace](/sdk/architecture/sdk/workspace)** - Execution abstraction +- **[Conversation](/sdk/architecture/sdk/conversation)** - Lifecycle management +- **[Event](/sdk/architecture/sdk/event)** - Event system +- **[LLM](/sdk/architecture/sdk/llm)** - Language model integration +- **[MCP](/sdk/architecture/sdk/mcp)** - Model Context Protocol +- **[Security](/sdk/architecture/sdk/security)** - Security framework + +## 2. Tools Package + +Production-ready tool implementations. + +**Available Tools:** +- **[BashTool](/sdk/architecture/tools/bash)** - Command execution +- **[FileEditorTool](/sdk/architecture/tools/file_editor)** - File manipulation +- **[GlobTool](/sdk/architecture/tools/glob)** - File discovery +- **[GrepTool](/sdk/architecture/tools/grep)** - Content search +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker)** - Task management +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor)** - Multi-file workflows +- **[BrowserUseTool](/sdk/architecture/tools/browser_use)** - Web interaction + +## 3. Workspace Package + +Advanced execution environments for production. + +**Workspace Types:** +- **[DockerWorkspace](/sdk/architecture/workspace/docker)** - Container-based isolation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api)** - Remote server execution + +See [Workspace Overview](/sdk/architecture/workspace/overview) for comparison. + +## 4. Agent Server Package + +HTTP server for centralized agent execution. + +**Capabilities:** +- Remote agent execution via API +- Multi-user isolation +- Container management +- Resource allocation + +See [Agent Server Documentation](/sdk/architecture/agent_server/overview). + +## Component Interaction + +```mermaid +graph LR + User[User] -->|Message| Conv[Conversation] + Conv -->|Manages| Agent[Agent] + + Agent -->|Reasons with| LLM[LLM] + Agent -->|Executes| Tools[Tools] + Agent -->|Guided by| Micro[Microagents] + + Tools -->|Run in| Workspace[Workspace] + + style User fill:#e1f5fe + style Conv fill:#fff3e0 + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fce4ec + style Workspace fill:#e0f2f1 +``` + +## Design Principles + +### Immutability & Serialization +All core classes are: +- **Immutable**: State changes create new instances +- **Serializable**: Full conversation state can be saved/restored +- **Type-safe**: Pydantic models ensure data integrity + +### Modularity +- **Composable**: Mix and match components as needed +- **Extensible**: Add custom tools, LLMs, or workspaces +- **Testable**: Each component can be tested in isolation + +### Backward Compatibility +- **Semantic versioning** indicates compatibility levels +- **Migration guides** provided for major changes + +## Getting Started + +New to the SDK? Start with the guides: + +- **[Getting Started](/sdk/guides/getting-started)** - Quick introduction +- **[Streaming Mode](/sdk/guides/streaming-mode)** - Execution patterns +- **[Tools & MCP](/sdk/guides/tools-and-mcp)** - Extending capabilities +- **[Workspaces](/sdk/guides/workspaces)** - Execution environments +- **[Sub-agents](/sdk/guides/subagents)** - Agent delegation + +## Deep Dive + +Explore individual components: + +- **SDK Package** - [Tool](/sdk/architecture/sdk/tool) | [Agent](/sdk/architecture/sdk/agent) | [LLM](/sdk/architecture/sdk/llm) | [Conversation](/sdk/architecture/sdk/conversation) +- **Tools Package** - [BashTool](/sdk/architecture/tools/bash) | [FileEditorTool](/sdk/architecture/tools/file_editor) +- **Workspace Package** - [DockerWorkspace](/sdk/architecture/workspace/docker) | [RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api) +- **Agent Server** - [Overview](/sdk/architecture/agent_server/overview) + +## Examples + +Browse the [`examples/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples) directory for practical implementations: + +- **Hello World** - Basic agent usage +- **Custom Tools** - Creating new capabilities +- **Docker Workspace** - Sandboxed execution +- **MCP Integration** - External tool servers +- **Planning Agent** - Multi-step workflows diff --git a/sdk/arch/sdk/agent.mdx b/sdk/arch/sdk/agent.mdx new file mode 100644 index 00000000..3c0da066 --- /dev/null +++ b/sdk/arch/sdk/agent.mdx @@ -0,0 +1,301 @@ +--- +title: Agent +description: Core orchestrator combining language models with tools to execute tasks through structured reasoning loops. +--- + +The Agent orchestrates LLM reasoning with tool execution to solve tasks. It manages the reasoning loop, system prompts, and state transitions while maintaining conversation context. + +**Source**: [`openhands/sdk/agent/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent) + +## Core Concepts + +```mermaid +graph TD + Agent[Agent] --> LLM[LLM] + Agent --> Tools[Tools] + Agent --> Context[AgentContext] + Agent --> Condenser[Condenser] + + Context --> Microagents[Microagents] + Tools --> Bash[BashTool] + Tools --> FileEditor[FileEditorTool] + Tools --> MCP[MCP Tools] + + style Agent fill:#e1f5fe + style LLM fill:#fff3e0 + style Tools fill:#e8f5e8 + style Context fill:#f3e5f5 +``` + +An agent combines: +- **LLM**: Language model for reasoning and decision-making +- **Tools**: Capabilities to interact with the environment +- **Context**: Additional knowledge and specialized expertise +- **Condenser**: Memory management for long conversations + +## Base Interface + +**Source**: [`openhands/sdk/agent/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/base.py) + +### AgentBase + +Abstract base class defining the agent interface: + +```python +from openhands.sdk.agent import AgentBase +from openhands.sdk.conversation import ConversationState + +class CustomAgent(AgentBase): + def step(self, state: ConversationState) -> ConversationState: + """Execute one reasoning step and return updated state.""" + # Your agent logic here + return updated_state +``` + +**Key Properties**: +- **Immutable**: Agents are frozen Pydantic models +- **Serializable**: Full agent configuration can be saved/restored +- **Type-safe**: Strict type checking with Pydantic validation + +## Agent Implementation + +**Source**: [`openhands/sdk/agent/agent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/agent.py) + +### Initialization Arguments + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[ + BashTool.create(), + FileEditorTool.create() + ], + mcp_config={}, # Optional MCP configuration + filter_tools_regex=None, # Optional regex to filter tools + agent_context=None, # Optional context with microagents + condenser=None, # Optional context condenser + security_analyzer=None, # Optional security analyzer + confirmation_policy=None, # Optional confirmation policy +) +``` + +### Key Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `llm` | `LLM` | Language model configuration (required) | +| `tools` | `list[Tool]` | Tools available to the agent | +| `mcp_config` | `dict` | MCP server configuration for external tools | +| `filter_tools_regex` | `str` | Regex to filter available tools | +| `agent_context` | `AgentContext` | Additional context and microagents | +| `condenser` | `CondenserBase` | Context condensation strategy | +| `security_analyzer` | `SecurityAnalyzer` | Security risk analysis | +| `confirmation_policy` | `ConfirmationPolicy` | Action confirmation strategy | + +## Agent Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant LLM + participant Tools + + User->>Conversation: Start conversation + Conversation->>Agent: Initialize state + loop Until task complete + Conversation->>Agent: step(state) + Agent->>LLM: Generate response + LLM->>Agent: Tool calls + reasoning + Agent->>Tools: Execute actions + Tools->>Agent: Observations + Agent->>Conversation: Updated state + end + Conversation->>User: Final result +``` + +### Execution Flow + +1. **Initialization**: Create agent with LLM and tools +2. **State Setup**: Pass agent to conversation +3. **Reasoning Loop**: Conversation calls `agent.step(state)` repeatedly +4. **Tool Execution**: Agent executes tool calls from LLM +5. **State Updates**: Agent returns updated conversation state +6. **Termination**: Loop ends when agent calls `FinishTool` + +## Usage Examples + +### Basic Agent + +See [`examples/01_standalone_sdk/01_hello_world.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py): + +```python +from openhands.sdk import Agent, LLM, Conversation +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create LLM +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +# Create agent +agent = Agent( + llm=llm, + tools=[ + BashTool.create(), + FileEditorTool.create() + ] +) + +# Use with conversation +conversation = Conversation(agent=agent) +await conversation.run(user_message="Your task here") +``` + +### Agent with Context + +See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py): + +```python +from openhands.sdk import Agent, AgentContext + +# Create context with microagents +context = AgentContext( + microagents=["testing_expert", "code_reviewer"] +) + +agent = Agent( + llm=llm, + tools=tools, + agent_context=context +) +``` + +### Agent with Memory Management + +See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): + +```python +from openhands.sdk.context import LLMCondenser + +condenser = LLMCondenser( + max_tokens=8000, + target_tokens=6000 +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +### Agent with MCP Tools + +See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py): + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} + +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config +) +``` + +### Planning Agent Workflow + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) for a complete example of multi-phase agent workflows. + +## System Prompts + +**Source**: [`openhands/sdk/agent/prompts/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent/prompts) + +Agents use Jinja2 templates for system prompts. Available templates: + +| Template | Use Case | Source | +|----------|----------|--------| +| `system_prompt.j2` | Default reasoning and tool usage | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt.j2) | +| `system_prompt_interactive.j2` | Interactive conversations | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_interactive.j2) | +| `system_prompt_long_horizon.j2` | Complex multi-step tasks | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_long_horizon.j2) | +| `system_prompt_planning.j2` | Planning-focused workflows | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_planning.j2) | + +### Custom Prompts + +Create custom agent classes with specialized prompts: + +```python +class PlanningAgent(Agent): + system_prompt_filename: str = "system_prompt_planning.j2" +``` + +## Custom Agent Development + +### Extending AgentBase + +```python +from openhands.sdk.agent import AgentBase +from openhands.sdk.conversation import ConversationState + +class SpecializedAgent(AgentBase): + # Custom configuration + max_iterations: int = 10 + + def step(self, state: ConversationState) -> ConversationState: + # Custom reasoning logic + # Tool selection and execution + # State management + return updated_state +``` + +### Multi-Agent Composition + +```python +class WorkflowAgent(AgentBase): + planning_agent: Agent + execution_agent: Agent + + def step(self, state: ConversationState) -> ConversationState: + # Phase 1: Planning + plan = self.planning_agent.step(state) + + # Phase 2: Execution + result = self.execution_agent.step(plan) + + return result +``` + +## Best Practices + +1. **Tool Selection**: Provide only necessary tools to reduce complexity +2. **Clear Instructions**: Use detailed user messages for better task understanding +3. **Context Management**: Use condensers for long-running conversations +4. **Error Handling**: Implement proper error recovery strategies +5. **Security**: Use confirmation policies for sensitive operations +6. **Testing**: Test agents with various scenarios and edge cases + +## See Also + +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Defining and using tools +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing agent conversations +- **[LLM](/sdk/architecture/sdk/llm.mdx)** - Language model configuration +- **[MCP](/sdk/architecture/sdk/mcp.mdx)** - External tool integration +- **[Security](/sdk/architecture/sdk/security.mdx)** - Security and confirmation policies diff --git a/sdk/arch/sdk/condenser.mdx b/sdk/arch/sdk/condenser.mdx new file mode 100644 index 00000000..59d59da6 --- /dev/null +++ b/sdk/arch/sdk/condenser.mdx @@ -0,0 +1,166 @@ +--- +title: Context Condenser +description: Manage agent memory by intelligently compressing conversation history when approaching token limits. +--- + +The context condenser manages agent memory by intelligently compressing conversation history when approaching token limits. This enables agents to maintain coherent context in long-running conversations without exceeding LLM context windows. + +**Source**: [`openhands/sdk/context/condenser/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/condenser) + +## Why Context Condensation? + +```mermaid +graph LR + A[Long Conversation] --> B{Token Limit?} + B -->|Approaching| C[Condense] + B -->|Within Limit| D[Continue] + C --> E[Compressed Context] + E --> F[Agent with Memory] + D --> F + + style A fill:#e1f5fe + style C fill:#fff3e0 + style E fill:#e8f5e8 + style F fill:#f3e5f5 +``` + +As conversations grow, they may exceed LLM context windows. Condensers solve this by: +- Summarizing older messages while preserving key information +- Maintaining recent context in full detail +- Reducing token count without losing conversation coherence + +## LLM Condenser (Default) + +**Source**: [`openhands/sdk/context/condenser/llm_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/condenser/llm_condenser.py) + +The default condenser uses an LLM to intelligently summarize conversation history. + +### How It Works + +1. **Monitor Token Count**: Tracks conversation token usage +2. **Trigger Condensation**: Activates when approaching token threshold +3. **Summarize History**: Uses LLM to compress older messages +4. **Preserve Recent**: Keeps recent messages uncompressed +5. **Update Context**: Replaces verbose history with summary + +### Configuration + +```python +from openhands.sdk.context import LLMCondenser + +condenser = LLMCondenser( + max_tokens=8000, # Trigger condensation at this limit + target_tokens=6000, # Reduce to this token count + preserve_recent=10 # Keep last N messages uncompressed +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +### Example Usage + +See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): + +```python +from openhands.sdk import Agent, LLM +from openhands.sdk.context import LLMCondenser +from pydantic import SecretStr + +# Configure condenser +condenser = LLMCondenser( + max_tokens=8000, + target_tokens=6000 +) + +# Create agent with condenser +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +## Condensation Strategy + +### Multi-Phase Approach + +```mermaid +sequenceDiagram + participant Agent + participant Condenser + participant LLM + + Agent->>Condenser: Check token count + Condenser->>Condenser: Exceeds threshold? + Condenser->>LLM: Summarize old messages + LLM->>Condenser: Summary + Condenser->>Agent: Updated context + Agent->>Agent: Continue with condensed history +``` + +### What Gets Condensed + +- **System messages**: Preserved as-is +- **Recent messages**: Kept in full (configurable count) +- **Older messages**: Summarized into compact form +- **Tool results**: Preserved for reference +- **User preferences**: Maintained across condensation + +## Custom Condensers + +Implement custom condensation strategies by extending the base class: + +```python +from openhands.sdk.context import CondenserBase +from openhands.sdk.event import ConversationState + +class CustomCondenser(CondenserBase): + def condense(self, state: ConversationState) -> ConversationState: + """Implement custom condensation logic.""" + # Your condensation algorithm + return condensed_state + + def should_condense(self, state: ConversationState) -> bool: + """Determine when to trigger condensation.""" + # Your trigger logic + return token_count > threshold +``` + +## Best Practices + +1. **Set Appropriate Thresholds**: Leave buffer room below actual limit +2. **Preserve Recent Context**: Keep enough messages for coherent flow +3. **Monitor Performance**: Track condensation frequency and effectiveness +4. **Test Condensation**: Verify important information isn't lost +5. **Adjust Per Use Case**: Different tasks need different settings + +## Configuration Guidelines + +| Use Case | max_tokens | target_tokens | preserve_recent | +|----------|-----------|---------------|-----------------| +| Short tasks | 4000 | 3000 | 5 | +| Medium conversations | 8000 | 6000 | 10 | +| Long-running agents | 16000 | 12000 | 20 | +| Code-heavy tasks | 12000 | 10000 | 15 | + +## Performance Considerations + +- **Condensation Cost**: Uses additional LLM calls +- **Latency**: Brief pause during condensation +- **Context Quality**: Trade-off between compression and information retention +- **Frequency**: Tune thresholds to minimize condensation events + +## See Also + +- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using condensers with agents +- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py)** - Working example +- **[Conversation State](/sdk/architecture/sdk/conversation.mdx)** - Managing conversation state diff --git a/sdk/arch/sdk/conversation.mdx b/sdk/arch/sdk/conversation.mdx new file mode 100644 index 00000000..e702fb36 --- /dev/null +++ b/sdk/arch/sdk/conversation.mdx @@ -0,0 +1,487 @@ +--- +title: Conversation +description: Manage agent lifecycles through structured message flows and state persistence. +--- + +The Conversation class orchestrates agent execution through structured message flows. It manages the agent lifecycle, state persistence, and provides APIs for interaction and monitoring. + +**Source**: [`openhands/sdk/conversation/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/conversation) + +## Core Concepts + +```mermaid +graph LR + User[User] --> Conversation[Conversation] + Conversation --> Agent[Agent] + Conversation --> State[ConversationState] + Conversation --> Events[Event History] + + Agent --> Step[step()] + State --> Persistence[Persistence] + + style Conversation fill:#e1f5fe + style Agent fill:#f3e5f5 + style State fill:#fff3e0 + style Events fill:#e8f5e8 +``` + +A conversation: +- **Manages Agent Lifecycle**: Initializes and runs agents until completion +- **Handles State**: Maintains conversation history and context +- **Enables Interaction**: Send messages and receive responses +- **Provides Persistence**: Save and restore conversation state +- **Monitors Progress**: Track execution stats and events + +## Basic API + +**Source**: [`openhands/sdk/conversation/conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation.py) + +### Creating a Conversation + +```python +from openhands.sdk import Conversation, Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create agent +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Create conversation +conversation = Conversation( + agent=agent, + workspace="workspace/project", # Working directory + persistence_dir="conversations", # Save conversation state + max_iteration_per_run=500, # Max steps per run + stuck_detection=True, # Detect infinite loops + visualize=True # Generate execution visualizations +) +``` + +### Constructor Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `agent` | `AgentBase` | *Required* | Agent to run in the conversation | +| `workspace` | `str \| LocalWorkspace \| RemoteWorkspace` | `"workspace/project"` | Execution environment | +| `persistence_dir` | `str \| None` | `None` | Directory for saving state | +| `conversation_id` | `ConversationID \| None` | `None` | Resume existing conversation | +| `callbacks` | `list[ConversationCallbackType] \| None` | `None` | Event callbacks | +| `max_iteration_per_run` | `int` | `500` | Maximum steps per `run()` call | +| `stuck_detection` | `bool` | `True` | Enable stuck detection | +| `visualize` | `bool` | `True` | Generate visualizations | +| `secrets` | `dict \| None` | `None` | Secret values for agent | + +## Agent Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant State + + User->>Conversation: Create conversation(agent) + Conversation->>State: Initialize state + Conversation->>Agent: init_state() + + User->>Conversation: send_message("Task") + Conversation->>State: Add message event + + User->>Conversation: run() + loop Until agent finishes or max iterations + Conversation->>Agent: step(state) + Agent->>State: Update with actions/observations + Conversation->>User: Callback with events + end + + User->>Conversation: agent_final_response() + Conversation->>User: Return final result +``` + +### 1. Create Agent + +Define agent with LLM and tools: + +```python +agent = Agent(llm=llm, tools=tools) +``` + +### 2. Create Conversation + +Pass agent to conversation: + +```python +conversation = Conversation(agent=agent) +``` + +### 3. Send Messages + +Add user messages to conversation: + +```python +conversation.send_message("Build a web scraper for news articles") +``` + +### 4. Run Agent + +Execute agent until task completion: + +```python +conversation.run() +``` + +The conversation will call `agent.step(state)` repeatedly until: +- Agent calls `FinishTool` +- Maximum iterations reached +- Agent encounters an error +- User pauses execution + +### 5. Get Results + +Retrieve agent's final response: + +```python +result = conversation.agent_final_response() +print(result) +``` + +## Core Methods + +**Source**: [`openhands/sdk/conversation/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/base.py) + +### send_message() + +Add a message to the conversation: + +```python +# String message +conversation.send_message("Write unit tests for the API") + +# Message object with images +from openhands.sdk.llm import Message, ImageContent + +message = Message( + role="user", + content=[ + "What's in this image?", + ImageContent(source="path/to/image.png") + ] +) +conversation.send_message(message) +``` + +See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). + +### run() + +Execute the agent until completion or max iterations: + +```python +# Synchronous execution +conversation.run() + +# Async execution +await conversation.run() +``` + +See [`examples/01_standalone_sdk/11_async.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) for async usage. + +### agent_final_response() + +Get the agent's final response: + +```python +final_response = conversation.agent_final_response() +``` + +### pause() + +Pause agent execution: + +```python +conversation.pause() +``` + +See [`examples/01_standalone_sdk/09_pause_example.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py). + +### close() + +Clean up resources: + +```python +conversation.close() +``` + +## Conversation State + +**Source**: [`openhands/sdk/conversation/state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/state.py) + +### Accessing State + +```python +state = conversation.state + +# Conversation properties +print(state.id) # Unique conversation ID +print(state.agent_status) # Current execution status +print(state.events) # Event history + +# Agent and workspace +print(state.agent) # The agent instance +print(state.workspace) # The workspace +``` + +### Agent Execution Status + +```python +from openhands.sdk.conversation.state import AgentExecutionStatus + +status = state.agent_status + +# Possible values: +# - AgentExecutionStatus.IDLE +# - AgentExecutionStatus.RUNNING +# - AgentExecutionStatus.FINISHED +# - AgentExecutionStatus.ERROR +# - AgentExecutionStatus.PAUSED +``` + +## Persistence + +### Saving Conversations + +Conversations are automatically persisted when `persistence_dir` is set: + +```python +conversation = Conversation( + agent=agent, + persistence_dir="conversations" # Saves to conversations// +) +``` + +See [`examples/01_standalone_sdk/10_persistence.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py). + +### Resuming Conversations + +Resume from a saved conversation ID: + +```python +from openhands.sdk.conversation.types import ConversationID + +# Get conversation ID +conv_id = conversation.id + +# Later, resume with the same ID +resumed_conversation = Conversation( + agent=agent, + conversation_id=conv_id, + persistence_dir="conversations" +) +``` + +## Monitoring and Stats + +**Source**: [`openhands/sdk/conversation/conversation_stats.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation_stats.py) + +### Conversation Stats + +```python +stats = conversation.conversation_stats + +print(stats.total_messages) # Total messages exchanged +print(stats.total_tokens) # Total tokens used +print(stats.total_cost) # Estimated cost +print(stats.duration) # Execution time +``` + +See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). + +## Event Callbacks + +### Registering Callbacks + +Monitor conversation events in real-time: + +```python +from openhands.sdk.conversation import ConversationCallbackType +from openhands.sdk.event import Event + +def on_event(event: Event): + if isinstance(event, MessageEvent): + print(f"Message: {event.content}") + elif isinstance(event, ActionEvent): + print(f"Action: {event.action.kind}") + elif isinstance(event, ObservationEvent): + print(f"Observation: {event.observation.kind}") + +conversation = Conversation( + agent=agent, + callbacks=[on_event] +) +``` + +## Advanced Features + +### Stuck Detection + +**Source**: [`openhands/sdk/conversation/stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/stuck_detector.py) + +Automatically detects when agents are stuck in loops: + +```python +conversation = Conversation( + agent=agent, + stuck_detection=True # Default: True +) +``` + +See [`examples/01_standalone_sdk/20_stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py). + +### Secrets Management + +**Source**: [`openhands/sdk/conversation/secrets_manager.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/secrets_manager.py) + +Provide secrets for agent operations: + +```python +conversation = Conversation( + agent=agent, + secrets={ + "API_KEY": "secret-value", + "DATABASE_URL": "postgres://..." + } +) + +# Update secrets during execution +conversation.update_secrets({ + "NEW_TOKEN": "new-value" +}) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +### Visualization + +**Source**: [`openhands/sdk/conversation/visualizer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/visualizer.py) + +Generate execution visualizations: + +```python +conversation = Conversation( + agent=agent, + visualize=True # Default: True +) + +# Visualizations saved to workspace/visualizations/ +``` + +### Title Generation + +Generate conversation titles: + +```python +title = conversation.generate_title(max_length=50) +print(f"Conversation: {title}") +``` + +## Local vs Remote Conversations + +### LocalConversation + +**Source**: [`openhands/sdk/conversation/impl/local_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/local_conversation.py) + +Runs agent locally: + +```python +from openhands.sdk.workspace import LocalWorkspace + +conversation = Conversation( + agent=agent, + workspace=LocalWorkspace(working_dir="/project") +) +``` + +### RemoteConversation + +**Source**: [`openhands/sdk/conversation/impl/remote_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/remote_conversation.py) + +Runs agent on remote server: + +```python +from openhands.workspace import RemoteAPIWorkspace + +conversation = Conversation( + agent=agent, + workspace=RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com" + ) +) +``` + +## Best Practices + +1. **Set Appropriate Iteration Limits**: Prevent runaway executions +2. **Use Persistence**: Save important conversations for resume/replay +3. **Monitor Events**: Use callbacks for real-time monitoring +4. **Handle Errors**: Check agent status and handle failures gracefully +5. **Clean Up Resources**: Call `close()` when done +6. **Enable Stuck Detection**: Catch infinite loops early +7. **Track Stats**: Monitor token usage and costs + +## Complete Example + +```python +from openhands.sdk import Conversation, Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create agent +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Create conversation +conversation = Conversation( + agent=agent, + workspace="workspace/project", + persistence_dir="conversations", + max_iteration_per_run=100 +) + +try: + # Send task + conversation.send_message("Create a simple REST API") + + # Run agent + conversation.run() + + # Get result + result = conversation.agent_final_response() + print(f"Result: {result}") + + # Check stats + stats = conversation.conversation_stats + print(f"Tokens used: {stats.total_tokens}") + print(f"Cost: ${stats.total_cost}") +finally: + # Clean up + conversation.close() +``` + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration and usage +- **[Events](/sdk/architecture/sdk/event.mdx)** - Event types and handling +- **[Workspace](/sdk/architecture/sdk/workspace.mdx)** - Workspace configuration +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - Usage examples diff --git a/sdk/arch/sdk/event.mdx b/sdk/arch/sdk/event.mdx new file mode 100644 index 00000000..a286dab0 --- /dev/null +++ b/sdk/arch/sdk/event.mdx @@ -0,0 +1,403 @@ +--- +title: Event System +description: Structured event types representing agent actions, observations, and system messages in conversations. +--- + +The event system provides structured representations of all interactions in agent conversations. Events enable state management, LLM communication, and real-time monitoring. + +**Source**: [`openhands/sdk/event/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/event) + +## Core Concepts + +```mermaid +graph TD + Event[Event] --> LLMConvertible[LLMConvertibleEvent] + Event --> NonConvertible[Non-LLM Events] + + LLMConvertible --> Action[ActionEvent] + LLMConvertible --> Observation[ObservationEvent] + LLMConvertible --> Message[MessageEvent] + LLMConvertible --> System[SystemPromptEvent] + + NonConvertible --> State[StateUpdateEvent] + NonConvertible --> User[UserActionEvent] + NonConvertible --> Condenser[CondenserEvent] + + style Event fill:#e1f5fe + style LLMConvertible fill:#fff3e0 + style NonConvertible fill:#e8f5e8 +``` + +Events fall into two categories: +- **LLMConvertibleEvent**: Events that become LLM messages +- **Non-LLM Events**: Internal state and control events + +## Base Event Classes + +**Source**: [`openhands/sdk/event/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/base.py) + +### Event + +Base class for all events: + +```python +from openhands.sdk.event import Event + +class Event: + id: str # Unique event identifier + timestamp: str # ISO format timestamp + source: SourceType # Event source (agent/user/system) +``` + +**Properties**: +- **Immutable**: Events are frozen Pydantic models +- **Serializable**: Full event data can be saved/restored +- **Visualizable**: Rich text representation for display + +### LLMConvertibleEvent + +Events that can be converted to LLM messages: + +```python +from openhands.sdk.event import LLMConvertibleEvent +from openhands.sdk.llm import Message + +class LLMConvertibleEvent(Event): + def to_llm_message(self) -> Message: + """Convert event to LLM message format.""" + ... +``` + +These events form the conversation history sent to the LLM. + +## LLM-Convertible Events + +### ActionEvent + +**Source**: [`openhands/sdk/event/llm_convertible/action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/action.py) + +Represents actions taken by the agent: + +```python +from openhands.sdk.event import ActionEvent +from openhands.sdk.tool import Action + +class ActionEvent(LLMConvertibleEvent): + action: Action # The action being executed + thought: str # Agent's reasoning (optional) +``` + +**Purpose**: Records what the agent decided to do. + +**Example**: +```python +from openhands.tools import BashAction + +action_event = ActionEvent( + source="agent", + action=BashAction(command="ls -la"), + thought="List files to understand directory structure" +) +``` + +### ObservationEvent + +**Source**: [`openhands/sdk/event/llm_convertible/observation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/observation.py) + +Represents observations from tool execution: + +```python +from openhands.sdk.event import ObservationEvent +from openhands.sdk.tool import Observation + +class ObservationEvent(LLMConvertibleEvent): + observation: Observation # Tool execution result +``` + +**Purpose**: Records the outcome of agent actions. + +**Example**: +```python +from openhands.tools import BashObservation + +observation_event = ObservationEvent( + source="tool", + observation=BashObservation( + output="file1.txt\nfile2.py\n", + exit_code=0 + ) +) +``` + +**Related Events**: +- **AgentErrorEvent**: Agent execution errors +- **UserRejectObservation**: User rejected an action + +### MessageEvent + +**Source**: [`openhands/sdk/event/llm_convertible/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/message.py) + +Represents messages in the conversation: + +```python +from openhands.sdk.event import MessageEvent + +class MessageEvent(LLMConvertibleEvent): + content: str | list # Message content (text or multimodal) + role: str # Role: "user", "assistant", "system" + images_urls: list[str] # Optional image URLs +``` + +**Purpose**: User messages, agent responses, and system messages. + +**Example**: +```python +message_event = MessageEvent( + source="user", + content="Create a web scraper", + role="user" +) +``` + +### SystemPromptEvent + +**Source**: [`openhands/sdk/event/llm_convertible/system.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/system.py) + +Represents system prompts: + +```python +from openhands.sdk.event import SystemPromptEvent + +class SystemPromptEvent(LLMConvertibleEvent): + content: str # System prompt content +``` + +**Purpose**: Provides instructions and context to the agent. + +## Non-LLM Events + +### ConversationStateUpdateEvent + +**Source**: [`openhands/sdk/event/conversation_state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/conversation_state.py) + +Tracks conversation state changes: + +```python +from openhands.sdk.event import ConversationStateUpdateEvent + +class ConversationStateUpdateEvent(Event): + # Internal state update event + # Not sent to LLM +``` + +**Purpose**: Internal tracking of conversation state transitions. + +### PauseEvent + +**Source**: [`openhands/sdk/event/user_action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/user_action.py) + +User paused the conversation: + +```python +from openhands.sdk.event import PauseEvent + +class PauseEvent(Event): + pass +``` + +**Purpose**: Signal that user has paused agent execution. + +### Condenser Events + +**Source**: [`openhands/sdk/event/condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/condenser.py) + +Track context condensation: + +#### Condensation + +```python +class Condensation(Event): + content: str # Condensed summary +``` + +**Purpose**: Record the condensed conversation history. + +#### CondensationRequest + +```python +class CondensationRequest(Event): + pass +``` + +**Purpose**: Request context condensation. + +#### CondensationSummaryEvent + +```python +class CondensationSummaryEvent(LLMConvertibleEvent): + content: str # Summary for LLM +``` + +**Purpose**: Provide condensed context to LLM. + +## Event Source Types + +**Source**: [`openhands/sdk/event/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/types.py) + +```python +SourceType = Literal["agent", "user", "tool", "system"] +``` + +- **agent**: Events from the agent +- **user**: Events from the user +- **tool**: Events from tool execution +- **system**: System-generated events + +## Event Streams + +### Converting to LLM Messages + +Events are converted to LLM messages for context: + +```python +from openhands.sdk.event import LLMConvertibleEvent + +events = [action_event, observation_event, message_event] +messages = LLMConvertibleEvent.events_to_messages(events) + +# Send to LLM +response = llm.completion(messages=messages) +``` + +### Event Batching + +Multiple actions in a single step are batched: + +```python +# Multi-action events +action1 = ActionEvent(action=BashAction(...)) +action2 = ActionEvent(action=FileEditAction(...)) + +# Converted to single LLM message with multiple tool calls +messages = LLMConvertibleEvent.events_to_messages([action1, action2]) +``` + +## Event Visualization + +Events support rich text visualization: + +```python +from openhands.sdk.event import Event + +event = MessageEvent( + source="user", + content="Hello", + role="user" +) + +# Rich text representation +print(event.visualize) + +# Plain text +print(str(event)) +# Output: MessageEvent (user) +# user: Hello +``` + +## Event Callbacks + +Monitor events in real-time: + +```python +from openhands.sdk import Conversation +from openhands.sdk.event import ( + Event, + ActionEvent, + ObservationEvent, + MessageEvent +) + +def on_event(event: Event): + if isinstance(event, MessageEvent): + print(f"šŸ’¬ Message: {event.content}") + elif isinstance(event, ActionEvent): + print(f"šŸ”§ Action: {event.action.kind}") + elif isinstance(event, ObservationEvent): + print(f"šŸ‘ļø Observation: {event.observation.content}") + +conversation = Conversation( + agent=agent, + callbacks=[on_event] +) +``` + +## Event History + +Access conversation event history: + +```python +conversation = Conversation(agent=agent) +conversation.send_message("Task") +conversation.run() + +# Get all events +events = conversation.state.events + +# Filter by type +actions = [e for e in events if isinstance(e, ActionEvent)] +observations = [e for e in events if isinstance(e, ObservationEvent)] +messages = [e for e in events if isinstance(e, MessageEvent)] +``` + +## Serialization + +Events are fully serializable: + +```python +# Serialize event +event_json = event.model_dump_json() + +# Deserialize +from openhands.sdk.event import Event +restored_event = Event.model_validate_json(event_json) +``` + +## Best Practices + +1. **Use Type Guards**: Check event types with `isinstance()` +2. **Handle All Types**: Cover all event types in callbacks +3. **Preserve Immutability**: Never mutate event objects +4. **Log Events**: Keep event history for debugging +5. **Filter Strategically**: Process only relevant events +6. **Visualize for Debugging**: Use `event.visualize` for rich output + +## Event Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant Events + + User->>Conversation: send_message() + Conversation->>Events: MessageEvent + + Conversation->>Agent: step() + Agent->>Events: ActionEvent(s) + + Agent->>Tool: Execute + Tool->>Events: ObservationEvent(s) + + Events->>LLM: Convert to messages + LLM->>Agent: Generate response + + Agent->>Events: New ActionEvent(s) +``` + +## See Also + +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing conversations and event streams +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent execution and event generation +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool actions and observations +- **[Condenser](/sdk/architecture/sdk/condenser.mdx)** - Context condensation events diff --git a/sdk/arch/sdk/llm.mdx b/sdk/arch/sdk/llm.mdx new file mode 100644 index 00000000..3a418d92 --- /dev/null +++ b/sdk/arch/sdk/llm.mdx @@ -0,0 +1,416 @@ +--- +title: LLM Integration +description: Language model integration supporting multiple providers through LiteLLM with built-in retry logic and metrics tracking. +--- + +The LLM class provides a unified interface for language model integration, supporting multiple providers through [LiteLLM](https://docs.litellm.ai/). It handles authentication, retries, metrics tracking, and streaming responses. + +**Source**: [`openhands/sdk/llm/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/llm) + +## Core Concepts + +```mermaid +graph LR + LLM[LLM] --> Completion[completion()] + LLM --> Metrics[Metrics Tracking] + LLM --> Retry[Retry Logic] + + Completion --> Provider[Provider API] + Provider --> OpenAI[OpenAI] + Provider --> Anthropic[Anthropic] + Provider --> Others[Other Providers] + + style LLM fill:#e1f5fe + style Completion fill:#fff3e0 + style Metrics fill:#e8f5e8 + style Retry fill:#f3e5f5 +``` + +## Basic Usage + +**Source**: [`openhands/sdk/llm/llm.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/llm.py) + +### Creating an LLM + +```python +from openhands.sdk import LLM +from pydantic import SecretStr + +# Basic configuration +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +# With custom settings +llm = LLM( + model="openai/gpt-4", + api_key=SecretStr("your-api-key"), + base_url="https://api.openai.com/v1", + temperature=0.7, + max_tokens=4096, + timeout=60.0 +) +``` + +### Configuration Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `model` | `str` | `"claude-sonnet-4-20250514"` | Model identifier | +| `api_key` | `SecretStr \| None` | `None` | API key for authentication | +| `base_url` | `str \| None` | `None` | Custom API endpoint | +| `temperature` | `float` | `0.0` | Sampling temperature (0-2) | +| `max_tokens` | `int \| None` | `None` | Maximum tokens to generate | +| `timeout` | `float` | `60.0` | Request timeout in seconds | +| `num_retries` | `int` | `8` | Number of retry attempts | +| `retry_min_wait` | `int` | `3` | Minimum retry wait (seconds) | +| `retry_max_wait` | `int` | `60` | Maximum retry wait (seconds) | +| `retry_multiplier` | `float` | `2.0` | Retry backoff multiplier | + +## Generating Completions + +### Basic Completion + +```python +from openhands.sdk.llm import Message + +messages = [ + Message(role="user", content="What is the capital of France?") +] + +response = llm.completion(messages=messages) +print(response.choices[0].message.content) +# Output: "The capital of France is Paris." +``` + +### With Tool Calling + +```python +from openhands.sdk import Agent +from openhands.tools import BashTool + +# Tools are automatically converted to function schemas +agent = Agent( + llm=llm, + tools=[BashTool.create()] +) + +# LLM receives tool schemas and can call them +``` + +### Streaming Responses + +```python +# Enable streaming +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + stream=True +) + +# Stream response chunks +for chunk in llm.completion(messages=messages): + if chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") +``` + +## Model Providers + +The SDK supports all providers available in LiteLLM: + +### Anthropic + +```python +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("sk-ant-...") +) +``` + +### OpenAI + +```python +llm = LLM( + model="openai/gpt-4", + api_key=SecretStr("sk-...") +) +``` + +### Azure OpenAI + +```python +llm = LLM( + model="azure/gpt-4", + api_key=SecretStr("your-azure-key"), + api_base="https://your-resource.openai.azure.com", + api_version="2024-02-01" +) +``` + +### Custom Providers + +```python +llm = LLM( + model="custom-provider/model-name", + base_url="https://custom-api.example.com/v1", + api_key=SecretStr("your-api-key") +) +``` + +See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for full list. + +## LLM Registry + +**Source**: Use pre-configured LLM instances from registry. + +See [`examples/01_standalone_sdk/05_use_llm_registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py): + +```python +from openhands.sdk.llm.registry import get_llm + +# Get pre-configured LLM +llm = get_llm( + model_name="claude-sonnet-4", + # Configuration from environment or defaults +) +``` + +## Metrics and Monitoring + +### Tracking Metrics + +**Source**: [`openhands/sdk/llm/utils/metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/utils/metrics.py) + +```python +# Get metrics snapshot +metrics = llm.metrics.snapshot() + +print(f"Total tokens: {metrics.accumulated_cost}") +print(f"Total cost: ${metrics.accumulated_cost}") +print(f"Requests: {metrics.total_requests}") +``` + +See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). + +### Cost Tracking + +```python +from openhands.sdk.conversation import Conversation + +conversation = Conversation(agent=Agent(llm=llm, tools=tools)) +conversation.send_message("Task") +conversation.run() + +# Get conversation stats +stats = conversation.conversation_stats +print(f"Total tokens: {stats.total_tokens}") +print(f"Estimated cost: ${stats.total_cost}") +``` + +## Advanced Features + +### LLM Routing + +**Source**: Route between different LLMs based on criteria. + +See [`examples/01_standalone_sdk/19_llm_routing.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py): + +```python +# Use different LLMs for different tasks +fast_llm = LLM(model="openai/gpt-4o-mini", api_key=SecretStr("...")) +powerful_llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key=SecretStr("...")) + +# Route based on task complexity +if task_is_simple: + agent = Agent(llm=fast_llm, tools=tools) +else: + agent = Agent(llm=powerful_llm, tools=tools) +``` + +### Model Reasoning + +**Source**: Access model reasoning from Anthropic thinking blocks and OpenAI responses API. + +See [`examples/01_standalone_sdk/22_model_reasoning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py): + +```python +# Enable Anthropic extended thinking +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + thinking={"type": "enabled", "budget_tokens": 1000} +) + +# Or use OpenAI responses API for reasoning +llm = LLM( + model="openai/gpt-5-codex", + api_key=SecretStr("your-api-key"), + reasoning_effort="high" +) +``` + +## Error Handling + +### Automatic Retries + +The LLM class automatically retries on transient failures: + +```python +from litellm.exceptions import RateLimitError, APIConnectionError + +# These exceptions trigger automatic retry: +# - APIConnectionError +# - RateLimitError +# - ServiceUnavailableError +# - Timeout +# - InternalServerError + +# Configure retry behavior +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + num_retries=8, # Number of retries + retry_min_wait=3, # Min wait between retries (seconds) + retry_max_wait=60, # Max wait between retries (seconds) + retry_multiplier=2.0 # Exponential backoff multiplier +) +``` + +### Exception Handling + +```python +from litellm.exceptions import ( + RateLimitError, + ContextWindowExceededError, + BadRequestError +) + +try: + response = llm.completion(messages=messages) +except RateLimitError: + print("Rate limit exceeded, automatic retry in progress") +except ContextWindowExceededError: + print("Context window exceeded, reduce message history") +except BadRequestError as e: + print(f"Bad request: {e}") +``` + +## Message Types + +**Source**: [`openhands/sdk/llm/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py) + +### Text Messages + +```python +from openhands.sdk.llm import Message + +message = Message( + role="user", + content="Hello, how are you?" +) +``` + +### Multimodal Messages + +```python +from openhands.sdk.llm import Message, ImageContent + +message = Message( + role="user", + content=[ + "What's in this image?", + ImageContent(source="path/to/image.png") + ] +) +``` + +See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). + +### Tool Call Messages + +```python +from openhands.sdk.llm import Message, MessageToolCall + +# Message with tool calls +message = Message( + role="assistant", + content="Let me run that command", + tool_calls=[ + MessageToolCall( + id="call_123", + function={"name": "execute_bash", "arguments": '{"command": "ls"}'} + ) + ] +) +``` + +## Model Features + +### Vision Support + +```python +from litellm.utils import supports_vision + +if supports_vision(llm.model): + # Model supports image inputs + message = Message( + role="user", + content=["Describe this image", ImageContent(source="image.png")] + ) +``` + +### Token Counting + +```python +from litellm.utils import token_counter + +# Count tokens in messages +messages = [Message(role="user", content="Hello world")] +tokens = token_counter(model=llm.model, messages=messages) +print(f"Message uses {tokens} tokens") +``` + +### Model Information + +```python +from litellm.utils import get_model_info + +info = get_model_info(llm.model) +print(f"Max tokens: {info['max_tokens']}") +print(f"Cost per token: {info['input_cost_per_token']}") +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: Adjust based on expected response time +2. **Configure Retries**: Balance reliability with latency requirements +3. **Monitor Metrics**: Track token usage and costs +4. **Handle Exceptions**: Implement proper error handling +5. **Use Streaming**: For better user experience with long responses +6. **Secure API Keys**: Use `SecretStr` and environment variables +7. **Choose Right Model**: Balance cost, speed, and capability + +## Environment Variables + +Configure LLM via environment variables: + +```bash +# API keys +export ANTHROPIC_API_KEY="sk-ant-..." +export OPENAI_API_KEY="sk-..." +export AZURE_API_KEY="..." + +# Custom endpoints +export OPENAI_API_BASE="https://custom-endpoint.com" + +# Model defaults +export LLM_MODEL="anthropic/claude-sonnet-4-20250514" +``` + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Using LLMs with agents +- **[Message Types](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py)** - Message structure +- **[LiteLLM Documentation](https://docs.litellm.ai/)** - Provider details +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - LLM usage examples diff --git a/sdk/arch/sdk/mcp.mdx b/sdk/arch/sdk/mcp.mdx new file mode 100644 index 00000000..ea18a670 --- /dev/null +++ b/sdk/arch/sdk/mcp.mdx @@ -0,0 +1,333 @@ +--- +title: MCP Integration +description: Connect agents to external tools and services through the Model Context Protocol. +--- + +MCP (Model Context Protocol) integration enables agents to connect to external tools and services through a standardized protocol. The SDK seamlessly converts MCP tools into native agent tools. + +**Source**: [`openhands/sdk/mcp/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/mcp) + +## What is MCP? + +[Model Context Protocol](https://modelcontextprotocol.io/) is an open protocol that standardizes how AI applications connect to external data sources and tools. It enables: + +- **Standardized Integration**: Connect to any MCP-compliant service +- **Dynamic Discovery**: Tools are discovered at runtime +- **Multiple Transports**: Support for stdio, HTTP, and SSE +- **OAuth Support**: Secure authentication for external services + +## Basic Usage + +### Creating MCP Tools + +```python +from openhands.sdk import create_mcp_tools + +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} + +# Create MCP tools +mcp_tools = create_mcp_tools(mcp_config, timeout=30) + +# Use with agent +from openhands.sdk import Agent +from openhands.tools import BashTool + +agent = Agent( + llm=llm, + tools=[ + BashTool.create(), + *mcp_tools # Add MCP tools + ] +) +``` + +See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py). + +### Using MCP Config in Agent + +```python +# Simpler: provide MCP config directly to agent +agent = Agent( + llm=llm, + tools=[BashTool.create()], + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } + } +) +``` + +## Configuration Formats + +The SDK uses the [FastMCP configuration format](https://gofastmcp.com/clients/client#configuration-format). + +### Stdio Servers + +Run local MCP servers via stdio: + +```python +mcp_config = { + "mcpServers": { + "filesystem": { + "transport": "stdio", # Optional, default + "command": "python", + "args": ["./mcp-server-filesystem.py"], + "env": {"DEBUG": "true"}, + "cwd": "/path/to/server" + } + } +} +``` + +### HTTP/SSE Servers + +Connect to remote MCP servers: + +```python +mcp_config = { + "mcpServers": { + "remote_api": { + "transport": "http", # or "sse" + "url": "https://api.example.com/mcp", + "headers": { + "Authorization": "Bearer token" + } + } + } +} +``` + +### OAuth Authentication + +Authenticate with OAuth-enabled services: + +```python +mcp_config = { + "mcpServers": { + "google_drive": { + "transport": "http", + "url": "https://mcp.google.com/drive", + "auth": "oauth", # Enable OAuth flow + } + } +} +``` + +See [`examples/01_standalone_sdk/08_mcp_with_oauth.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py). + +## Available MCP Servers + +Popular MCP servers you can integrate: + +### Official Servers + +- **fetch**: HTTP requests ([mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)) +- **filesystem**: File operations ([mcp-server-filesystem](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem)) +- **git**: Git operations ([mcp-server-git](https://github.com/modelcontextprotocol/servers/tree/main/src/git)) +- **github**: GitHub API ([mcp-server-github](https://github.com/modelcontextprotocol/servers/tree/main/src/github)) +- **postgres**: PostgreSQL queries ([mcp-server-postgres](https://github.com/modelcontextprotocol/servers/tree/main/src/postgres)) + +### Community Servers + +See [MCP Servers Directory](https://github.com/modelcontextprotocol/servers) for more. + +## MCP Tool Conversion + +MCP tools are automatically converted to SDK tools: + +```mermaid +graph LR + MCPServer[MCP Server] --> Discovery[Tool Discovery] + Discovery --> Schema[Tool Schema] + Schema --> SDKTool[SDK Tool] + SDKTool --> Agent[Agent] + + style MCPServer fill:#e1f5fe + style SDKTool fill:#fff3e0 + style Agent fill:#e8f5e8 +``` + +1. **Discovery**: MCP server lists available tools +2. **Schema Extraction**: Tool schemas extracted from MCP +3. **Tool Creation**: SDK tools created with proper typing +4. **Agent Integration**: Tools available to agent + +## Configuration Options + +### Timeout + +Set connection timeout for MCP servers: + +```python +mcp_tools = create_mcp_tools(mcp_config, timeout=60) # 60 seconds +``` + +### Multiple Servers + +Configure multiple MCP servers: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "uvx", + "args": ["mcp-server-filesystem"] + }, + "github": { + "command": "uvx", + "args": ["mcp-server-github"] + } + } +} +``` + +All tools from all servers are available to the agent. + +## Error Handling + +```python +try: + mcp_tools = create_mcp_tools(mcp_config, timeout=30) +except TimeoutError: + print("MCP server connection timed out") +except Exception as e: + print(f"Failed to create MCP tools: {e}") + mcp_tools = [] # Continue without MCP tools + +agent = Agent(llm=llm, tools=[*base_tools, *mcp_tools]) +``` + +## Tool Filtering + +Filter MCP tools using regex: + +```python +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^fetch_.*" # Only tools starting with "fetch_" +) +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: MCP servers may take time to initialize +2. **Handle Failures Gracefully**: Continue with reduced functionality if MCP fails +3. **Use Official Servers**: Start with well-tested MCP servers +4. **Secure Credentials**: Use environment variables for sensitive data +5. **Test Locally First**: Verify MCP servers work before deploying +6. **Monitor Performance**: MCP adds latency, monitor impact +7. **Version Pin**: Specify exact versions of MCP servers + +## Environment Variables + +Configure MCP servers via environment: + +```bash +# GitHub MCP server +export GITHUB_PERSONAL_ACCESS_TOKEN="ghp_..." + +# Google Drive OAuth +export GOOGLE_CLIENT_ID="..." +export GOOGLE_CLIENT_SECRET="..." + +# Custom MCP endpoints +export MCP_FETCH_URL="https://custom-mcp.example.com" +``` + +## Advanced Usage + +### Custom MCP Client + +For advanced control, use the MCP client directly: + +```python +from openhands.sdk.mcp.client import MCPClient + +# Create custom MCP client +client = MCPClient( + server_config={ + "command": "python", + "args": ["./custom-server.py"] + }, + timeout=60 +) + +# Get tools from client +tools = client.list_tools() + +# Use tools with agent +agent = Agent(llm=llm, tools=tools) +``` + +## Debugging + +### Enable Debug Logging + +```python +import logging + +logging.getLogger("openhands.sdk.mcp").setLevel(logging.DEBUG) +``` + +### Verify MCP Server + +Test MCP server independently: + +```bash +# Run MCP server directly +uvx mcp-server-fetch + +# Check if server responds +curl http://localhost:3000/mcp/tools +``` + +## Common Issues + +### Server Not Found + +```python +# Ensure server is installed +# For uvx-based servers: +uvx --help # Check if uvx is available +uvx mcp-server-fetch --help # Check if server is available +``` + +### Connection Timeout + +```python +# Increase timeout +mcp_tools = create_mcp_tools(mcp_config, timeout=120) +``` + +### OAuth Flow Issues + +```python +# Ensure OAuth credentials are configured +# Check browser opens for OAuth consent +# Verify redirect URL matches configuration +``` + +## See Also + +- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Official MCP documentation +- **[MCP Servers](https://github.com/modelcontextprotocol/servers)** - Official server implementations +- **[FastMCP](https://gofastmcp.com/)** - Configuration format documentation +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - SDK tool system +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py)** - MCP integration examples diff --git a/sdk/arch/sdk/microagents.mdx b/sdk/arch/sdk/microagents.mdx new file mode 100644 index 00000000..00c95dd8 --- /dev/null +++ b/sdk/arch/sdk/microagents.mdx @@ -0,0 +1,225 @@ +--- +title: Microagents +description: Specialized context providers that inject targeted knowledge into agent conversations. +--- + +Microagents are specialized context providers that inject targeted knowledge into agent conversations when specific triggers are detected. They enable modular, reusable expertise without modifying the main agent. + +## What are Microagents? + +Microagents provide focused knowledge or instructions that are dynamically added to the agent's context when relevant keywords are detected in the conversation. This allows agents to access specialized expertise on-demand. + +For a comprehensive guide on using microagents, see the [official microagents documentation](https://docs.all-hands.dev/usage/prompting/microagents-overview). + +**Source**: [`openhands/sdk/context/microagents/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/microagents) + +## Microagent Types + +**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) + +The SDK provides three types of microagents, each serving a distinct purpose: + +### 1. KnowledgeMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L162`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L162) + +Provides specialized expertise triggered by keywords in conversations. + +**Activation Logic:** +- Contains a list of trigger keywords +- Activated when any trigger appears in conversation +- Case-insensitive matching + +**Use Cases:** +- Language best practices (Python, JavaScript, etc.) +- Framework guidelines (React, Django, etc.) +- Common patterns and anti-patterns +- Tool usage instructions + +**Example:** +```python +from openhands.sdk.context.microagents import KnowledgeMicroagent + +microagent = KnowledgeMicroagent( + name="python_testing", + content="Always use pytest for Python tests...", + triggers=["pytest", "test", "unittest"] +) + +# Triggered when message contains "pytest", "test", or "unittest" +``` + +### 2. RepoMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L191`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L191) + +Repository-specific knowledge that's always active when working with a repository. + +**Activation Logic:** +- No triggers required +- Always loaded and active for the repository +- Can define MCP tools configuration + +**Use Cases:** +- Repository-specific guidelines +- Team practices and conventions +- Project-specific workflows +- Custom documentation references + +**Special Files:** +- `.openhands_instructions` - Legacy repo instructions +- `.cursorrules` - Cursor IDE rules (auto-loaded) +- `agents.md` / `agent.md` - Agent instructions (auto-loaded) + +**Example:** +```python +from openhands.sdk.context.microagents import RepoMicroagent + +microagent = RepoMicroagent( + name="project_guidelines", + content="This project uses...", + mcp_tools={"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]}} +) +``` + +### 3. TaskMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L236`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L236) + +Specialized KnowledgeMicroagent that requires user input before execution. + +**Activation Logic:** +- Triggered by `/{agent_name}` format +- Prompts user for required inputs +- Processes inputs before injecting knowledge + +**Use Cases:** +- Deployment procedures requiring credentials +- Multi-step workflows with parameters +- Interactive debugging sessions +- Customized task execution + +**Example:** +```python +from openhands.sdk.context.microagents import TaskMicroagent, InputMetadata + +microagent = TaskMicroagent( + name="deploy", + content="Deploy to {environment} with {version}...", + triggers=["/deploy"], + inputs=[ + InputMetadata(name="environment", type="string", required=True), + InputMetadata(name="version", type="string", required=True) + ] +) + +# User types: "/deploy" +# Agent prompts: "Enter environment:" "Enter version:" +# Agent proceeds with filled template +``` + +## How Microagents Work + +```mermaid +sequenceDiagram + participant User + participant Agent + participant Microagent + participant LLM + + User->>Agent: "Run the tests" + Agent->>Agent: Detect keyword "tests" + Agent->>Microagent: Fetch testing microagent + Microagent->>Agent: Return testing guidelines + Agent->>LLM: Context + guidelines + LLM->>Agent: Response with testing knowledge + Agent->>User: Execute tests with guidelines +``` + +## Using Microagents + +### Basic Usage + +```python +from openhands.sdk import Agent, AgentContext + +# Create context with microagents +context = AgentContext( + microagents=["testing_expert", "code_reviewer"] +) + +# Create agent with microagents +agent = Agent( + llm=llm, + tools=tools, + agent_context=context +) +``` + +### Example Integration + +See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py) for a complete example. + +## Microagent Structure + +**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) + +A microagent consists of: +- **Name**: Unique identifier +- **Triggers**: Keywords that activate the microagent +- **Content**: Knowledge or instructions to inject +- **Type**: One of "knowledge", "repo", or "task" + +## Response Models + +**Source**: [`openhands/sdk/context/microagents/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/types.py) + +### MicroagentKnowledge + +```python +class MicroagentKnowledge(BaseModel): + name: str # Microagent name + trigger: str # Keyword that triggered it + content: str # Injected content +``` + +### MicroagentResponse + +```python +class MicroagentResponse(BaseModel): + name: str # Microagent name + path: str # Path or identifier + created_at: datetime # Creation timestamp +``` + +### MicroagentContentResponse + +```python +class MicroagentContentResponse(BaseModel): + content: str # Full microagent content + path: str # Path or identifier + triggers: list[str] # Trigger keywords + git_provider: str | None # Git source if applicable +``` + +## Benefits + +1. **Modularity**: Separate specialized knowledge from main agent logic +2. **Reusability**: Share microagents across multiple agents +3. **Maintainability**: Update expertise without modifying agent code +4. **Context-Aware**: Only inject relevant knowledge when needed +5. **Composability**: Combine multiple microagents for comprehensive coverage + +## Best Practices + +1. **Clear Triggers**: Use specific, unambiguous trigger keywords +2. **Focused Content**: Keep microagent content concise and targeted +3. **Avoid Overlap**: Minimize trigger conflicts between microagents +4. **Version Control**: Store microagents in version-controlled repositories +5. **Documentation**: Document trigger keywords and intended use cases + +## See Also + +- **[Official Microagents Guide](https://docs.all-hands.dev/usage/prompting/microagents-overview)** - Comprehensive documentation +- **[Agent Context](/sdk/architecture/sdk/agent.mdx)** - Using context with agents +- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py)** - Working example diff --git a/sdk/arch/sdk/security.mdx b/sdk/arch/sdk/security.mdx new file mode 100644 index 00000000..a41264fc --- /dev/null +++ b/sdk/arch/sdk/security.mdx @@ -0,0 +1,416 @@ +--- +title: Security +description: Analyze and control agent actions through security analyzers and confirmation policies. +--- + +The security system enables control over agent actions through risk analysis and confirmation policies. It helps prevent dangerous operations while maintaining agent autonomy for safe actions. + +**Source**: [`openhands/sdk/security/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/security) + +## Core Concepts + +```mermaid +graph TD + Action[Agent Action] --> Analyzer[Security Analyzer] + Analyzer --> Risk[Risk Assessment] + Risk --> Policy[Confirmation Policy] + + Policy --> Low{Risk Level} + Low -->|Low| Execute[Execute] + Low -->|Medium| MaybeConfirm[Policy Decision] + Low -->|High| Confirm[Request Confirmation] + + Confirm --> User[User Decision] + User -->|Approve| Execute + User -->|Reject| Block[Block Action] + + style Action fill:#e1f5fe + style Analyzer fill:#fff3e0 + style Policy fill:#e8f5e8 + style Execute fill:#c8e6c9 + style Block fill:#ffcdd2 +``` + +The security system consists of two components: +- **Security Analyzer**: Assesses risk level of actions +- **Confirmation Policy**: Decides when to require user confirmation + +## Security Analyzer + +### LLM Security Analyzer + +**Source**: [`openhands/sdk/security/llm_analyzer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/llm_analyzer.py) + +Uses an LLM to analyze action safety: + +```python +from openhands.sdk.security import LLMSecurityAnalyzer +from openhands.sdk import Agent, LLM +from pydantic import SecretStr + +# Create security analyzer +security_analyzer = LLMSecurityAnalyzer( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ) +) + +# Create agent with security analyzer +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer +) +``` + +### Risk Levels + +**Source**: [`openhands/sdk/security/risk.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/risk.py) + +```python +from openhands.sdk.security.risk import SecurityRisk + +# Risk levels +SecurityRisk.LOW # Safe operations (read files, list directories) +SecurityRisk.MEDIUM # Potentially impactful (write files, API calls) +SecurityRisk.HIGH # Dangerous operations (delete files, system changes) +``` + +### How LLM Analyzer Works + +1. **Action Inspection**: Examines the action and its parameters +2. **Context Analysis**: Considers conversation history and workspace +3. **Risk Assessment**: LLM predicts risk level with reasoning +4. **Risk Return**: Returns risk level and explanation + +```python +# Example internal flow +action = BashAction(command="rm -rf /") +risk = security_analyzer.analyze(action, context) +# Returns: SecurityRisk.HIGH, "Attempting to delete entire filesystem" +``` + +### Custom Security Analyzer + +Implement custom risk analysis: + +```python +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.tool import Action + +class PatternBasedAnalyzer(SecurityAnalyzerBase): + dangerous_patterns = ["rm -rf", "sudo", "DROP TABLE"] + + def analyze( + self, + action: Action, + context: dict + ) -> tuple[SecurityRisk, str]: + command = getattr(action, "command", "") + + for pattern in self.dangerous_patterns: + if pattern in command: + return ( + SecurityRisk.HIGH, + f"Dangerous pattern detected: {pattern}" + ) + + return SecurityRisk.LOW, "No dangerous patterns found" +``` + +## Confirmation Policies + +**Source**: [`openhands/sdk/security/confirmation_policy.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/confirmation_policy.py) + +### Built-in Policies + +#### NeverConfirm + +Never request confirmation (default): + +```python +from openhands.sdk.security import NeverConfirm + +agent = Agent( + llm=llm, + tools=tools, + confirmation_policy=NeverConfirm() +) +``` + +#### AlwaysConfirm + +Always request confirmation: + +```python +from openhands.sdk.security import AlwaysConfirm + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=AlwaysConfirm() +) +``` + +#### ConfirmOnHighRisk + +Confirm only high-risk actions: + +```python +from openhands.sdk.security import ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnHighRisk() +) +``` + +#### ConfirmOnMediumOrHighRisk + +Confirm medium and high-risk actions: + +```python +from openhands.sdk.security import ConfirmOnMediumOrHighRisk + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnMediumOrHighRisk() +) +``` + +### Custom Confirmation Policy + +Implement custom confirmation logic: + +```python +from openhands.sdk.security.confirmation_policy import ConfirmationPolicyBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.tool import Action + +class TimeBasedPolicy(ConfirmationPolicyBase): + """Require confirmation during business hours.""" + + def should_confirm( + self, + action: Action, + risk: SecurityRisk, + context: dict + ) -> bool: + from datetime import datetime + + hour = datetime.now().hour + + # Business hours: always confirm high risk + if 9 <= hour <= 17: + return risk >= SecurityRisk.HIGH + + # Off hours: confirm medium and high risk + return risk >= SecurityRisk.MEDIUM +``` + +## Using Security System + +### Basic Setup + +```python +from openhands.sdk import Agent, LLM, Conversation +from openhands.sdk.security import ( + LLMSecurityAnalyzer, + ConfirmOnHighRisk +) +from pydantic import SecretStr + +# Create analyzer +security_analyzer = LLMSecurityAnalyzer( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ) +) + +# Create agent with security +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnHighRisk() +) + +# Use in conversation +conversation = Conversation(agent=agent) +``` + +See [`examples/01_standalone_sdk/04_human_in_the_loop.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py). + +### Handling Confirmations + +```python +from openhands.sdk import Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus + +conversation = Conversation(agent=agent) +conversation.send_message("Delete all temporary files") + +# Run agent +conversation.run() + +# Check if waiting for confirmation +if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + print("Action requires confirmation:") + # Show pending action details + + # User approves + conversation.confirm_pending_action() + conversation.run() + + # Or user rejects + # conversation.reject_pending_action(reason="Too risky") +``` + +### Dynamic Policy Changes + +Change confirmation policy during execution: + +```python +from openhands.sdk.security import AlwaysConfirm, NeverConfirm + +conversation = Conversation(agent=agent) + +# Start with strict policy +conversation.set_confirmation_policy(AlwaysConfirm()) +conversation.send_message("Sensitive task") +conversation.run() + +# Switch to permissive policy +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Safe task") +conversation.run() +``` + +## Security Workflow + +```mermaid +sequenceDiagram + participant Agent + participant Analyzer + participant Policy + participant User + participant Tool + + Agent->>Analyzer: analyze(action) + Analyzer->>Analyzer: Assess risk + Analyzer->>Agent: risk + explanation + + Agent->>Policy: should_confirm(action, risk) + Policy->>Policy: Apply policy rules + + alt No confirmation needed + Policy->>Agent: execute + Agent->>Tool: Execute action + Tool->>Agent: Observation + else Confirmation required + Policy->>User: Request approval + User->>Policy: Approve/Reject + alt Approved + Policy->>Agent: execute + Agent->>Tool: Execute action + else Rejected + Policy->>Agent: block + Agent->>Agent: UserRejectObservation + end + end +``` + +## Best Practices + +1. **Use LLM Analyzer**: Provides nuanced risk assessment +2. **Start Conservative**: Begin with strict policies, relax as needed +3. **Monitor Blocked Actions**: Review what's being blocked +4. **Provide Context**: Better context enables better risk assessment +5. **Test Security Setup**: Verify policies work as expected +6. **Document Policies**: Explain confirmation requirements to users +7. **Handle Rejections**: Implement proper error handling for rejected actions + +## Performance Considerations + +### LLM Analyzer Overhead + +LLM security analysis adds latency: +- **Cost**: Additional LLM call per action +- **Latency**: ~1-2 seconds per analysis +- **Tokens**: ~500-1000 tokens per analysis + +```python +# Only use with confirmation policy +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, # Costs tokens + confirmation_policy=ConfirmOnHighRisk() # Must be used together +) +``` + +### Optimization Tips + +1. **Cache Similar Actions**: Reuse assessments for similar actions +2. **Use Faster Models**: Consider faster LLMs for security analysis +3. **Pattern-Based Pre-Filter**: Use pattern matching before LLM analysis +4. **Batch Analysis**: Analyze multiple actions together when possible + +## Security Best Practices + +### Principle of Least Privilege + +```python +# Provide only necessary tools +agent = Agent( + llm=llm, + tools=[ + FileEditorTool.create(), # Safe file operations + # Don't include BashTool for untrusted tasks + ] +) +``` + +### Sandbox Execution + +```python +# Use DockerWorkspace for isolation +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +conversation = Conversation(agent=agent, workspace=workspace) +``` + +### Secrets Management + +```python +# Provide secrets securely +conversation = Conversation( + agent=agent, + secrets={ + "API_KEY": "secret-value", + "PASSWORD": "secure-password" + } +) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration with security +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Handling confirmations +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool security considerations +- **[Human-in-the-Loop Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py)** - Complete example diff --git a/sdk/arch/sdk/tool.mdx b/sdk/arch/sdk/tool.mdx new file mode 100644 index 00000000..3bbe737e --- /dev/null +++ b/sdk/arch/sdk/tool.mdx @@ -0,0 +1,199 @@ +--- +title: Tool System +description: Define custom tools for agents to interact with external systems through typed action/observation patterns. +--- + +The tool system enables agents to interact with external systems and perform actions. Tools follow a typed action/observation pattern with comprehensive validation and schema generation. + +## Core Concepts + +```mermaid +graph LR + Action[Action] --> Tool[Tool] + Tool --> Executor[ToolExecutor] + Executor --> Observation[Observation] + + style Action fill:#e1f5fe + style Tool fill:#f3e5f5 + style Executor fill:#fff3e0 + style Observation fill:#e8f5e8 +``` + +A tool consists of three components: +- **Action**: Input schema defining tool parameters +- **ToolExecutor**: Logic that executes the tool +- **Observation**: Output schema with execution results + +**Source**: [`openhands/sdk/tool/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool) + +## Defining Custom Tools + +### 1. Define Action and Observation + +**Source**: [`openhands/sdk/tool/schema.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/schema.py) + +```python +from openhands.sdk.tool import Action, Observation + +class CalculateAction(Action): + """Action to perform calculation.""" + expression: str + precision: int = 2 + +class CalculateObservation(Observation): + """Result of calculation.""" + result: float + success: bool +``` + +### 2. Implement ToolExecutor + +**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) + +```python +from openhands.sdk.tool import ToolExecutor + +class CalculateExecutor(ToolExecutor[CalculateAction, CalculateObservation]): + def __call__(self, action: CalculateAction) -> CalculateObservation: + try: + result = eval(action.expression) + return CalculateObservation( + result=round(result, action.precision), + success=True + ) + except Exception as e: + return CalculateObservation( + result=0.0, + success=False, + error=str(e) + ) +``` + +### 3. Create Tool Class + +```python +from openhands.sdk.tool import Tool + +class CalculateTool(Tool[CalculateAction, CalculateObservation]): + name: str = "calculate" + description: str = "Evaluate mathematical expressions" + action_type: type[Action] = CalculateAction + observation_type: type[Observation] = CalculateObservation + + @classmethod + def create(cls) -> list["CalculateTool"]: + executor = CalculateExecutor() + return [cls().set_executor(executor)] +``` + +### Complete Example + +See [`examples/01_standalone_sdk/02_custom_tools.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) for a working example. + +## Built-in Tools + +**Source**: [`openhands/sdk/tool/builtins/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool/builtins) + +### FinishTool + +**Source**: [`openhands/sdk/tool/builtins/finish.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/finish.py) + +Signals task completion with optional output. + +```python +from openhands.sdk.tool.builtins import FinishTool + +# Automatically included with agents +finish_tool = FinishTool.create() +``` + +### ThinkTool + +**Source**: [`openhands/sdk/tool/builtins/think.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/think.py) + +Enables internal reasoning without external actions. + +```python +from openhands.sdk.tool.builtins import ThinkTool + +# Automatically included with agents +think_tool = ThinkTool.create() +``` + +## Tool Annotations + +**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) + +Provide hints about tool behavior following [MCP spec](https://modelcontextprotocol.io/): + +```python +from openhands.sdk.tool import ToolAnnotations + +annotations = ToolAnnotations( + title="Calculate", + readOnlyHint=True, # Tool doesn't modify environment + destructiveHint=False, # Tool doesn't perform destructive updates + idempotentHint=True, # Same input produces same output + openWorldHint=False # Tool doesn't interact with external entities +) + +class CalculateTool(Tool[CalculateAction, CalculateObservation]): + annotations: ToolAnnotations = annotations + # ... rest of tool definition +``` + +## Tool Registry + +**Source**: [`openhands/sdk/tool/registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/registry.py) + +Tools are automatically registered when defined. The registry manages tool discovery and schema generation for LLM function calling. + +## Advanced Patterns + +### Stateful Executors + +Executors can maintain state across executions: + +```python +class DatabaseExecutor(ToolExecutor[QueryAction, QueryObservation]): + def __init__(self, connection_string: str): + self.connection = connect(connection_string) + + def __call__(self, action: QueryAction) -> QueryObservation: + result = self.connection.execute(action.query) + return QueryObservation(rows=result.fetchall()) + + def close(self) -> None: + """Clean up resources.""" + self.connection.close() +``` + +### Dynamic Tool Creation + +Create tools with runtime configuration: + +```python +class ConfigurableTool(Tool[MyAction, MyObservation]): + @classmethod + def create(cls, api_key: str, endpoint: str) -> list["ConfigurableTool"]: + executor = MyExecutor(api_key=api_key, endpoint=endpoint) + return [cls().set_executor(executor)] + +# Use with different configurations +tool1 = ConfigurableTool.create(api_key="key1", endpoint="https://api1.com") +tool2 = ConfigurableTool.create(api_key="key2", endpoint="https://api2.com") +``` + +## Best Practices + +1. **Type Safety**: Use Pydantic models for actions and observations +2. **Error Handling**: Always handle exceptions in executors +3. **Resource Management**: Implement `close()` for cleanup +4. **Clear Descriptions**: Provide detailed docstrings for LLM understanding +5. **Validation**: Leverage Pydantic validators for input validation + +## See Also + +- **[Pre-defined Tools](/sdk/architecture/tools/)** - Ready-to-use tool implementations +- **[MCP Integration](/sdk/architecture/sdk/mcp.mdx)** - Connect to external MCP tools +- **[Agent Usage](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents diff --git a/sdk/arch/sdk/workspace.mdx b/sdk/arch/sdk/workspace.mdx new file mode 100644 index 00000000..42d61900 --- /dev/null +++ b/sdk/arch/sdk/workspace.mdx @@ -0,0 +1,322 @@ +--- +title: Workspace Interface +description: Abstract interface for agent execution environments supporting local and remote operations. +--- + +The workspace interface defines how agents interact with their execution environment. It provides a unified API for file operations and command execution, supporting both local and remote environments. + +**Source**: [`openhands/sdk/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace) + +## Core Concepts + +```mermaid +graph TD + BaseWorkspace[BaseWorkspace] --> Local[LocalWorkspace] + BaseWorkspace --> Remote[RemoteWorkspace] + + Local --> FileOps[File Operations] + Local --> CmdExec[Command Execution] + + Remote --> Docker[DockerWorkspace] + Remote --> API[RemoteAPIWorkspace] + + style BaseWorkspace fill:#e1f5fe + style Local fill:#e8f5e8 + style Remote fill:#fff3e0 +``` + +A workspace provides: +- **File Operations**: Upload, download, read, write +- **Command Execution**: Run bash commands with timeout support +- **Resource Management**: Context manager protocol for cleanup +- **Flexibility**: Local development or remote sandboxed execution + +## Base Interface + +**Source**: [`openhands/sdk/workspace/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/base.py) + +### BaseWorkspace + +Abstract base class defining the workspace interface: + +```python +from openhands.sdk.workspace import BaseWorkspace + +class CustomWorkspace(BaseWorkspace): + working_dir: str # Required: working directory path + + def execute_command( + self, + command: str, + cwd: str | None = None, + timeout: float = 30.0 + ) -> CommandResult: + """Execute bash command.""" + ... + + def file_upload( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + """Upload file to workspace.""" + ... + + def file_download( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + """Download file from workspace.""" + ... +``` + +### Context Manager Protocol + +All workspaces support the context manager protocol for safe resource management: + +```python +with workspace: + result = workspace.execute_command("echo 'hello'") + # Workspace automatically cleans up on exit +``` + +## LocalWorkspace + +**Source**: [`openhands/sdk/workspace/local.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/local.py) + +Executes operations directly on the local machine. + +```python +from openhands.sdk.workspace import LocalWorkspace + +workspace = LocalWorkspace(working_dir="/path/to/project") + +# Execute command +result = workspace.execute_command("ls -la") +print(result.stdout) + +# Upload file (copy) +workspace.file_upload("local_file.txt", "workspace_file.txt") + +# Download file (copy) +workspace.file_download("workspace_file.txt", "local_copy.txt") +``` + +**Use Cases**: +- Local development and testing +- Direct file system access +- No sandboxing required +- Fast execution without network overhead + +## RemoteWorkspace + +**Source**: [`openhands/sdk/workspace/remote/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace/remote) + +Abstract base for remote execution environments. + +### RemoteWorkspace Mixin + +**Source**: [`openhands/sdk/workspace/remote/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/base.py) + +Provides common functionality for remote workspaces: +- Network communication +- File transfer protocols +- Command execution over API +- Resource cleanup + +### AsyncRemoteWorkspace + +**Source**: [`openhands/sdk/workspace/remote/async_remote_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/async_remote_workspace.py) + +Async version for concurrent operations. + +## Concrete Remote Implementations + +Remote workspace implementations are provided in the `workspace` package: + +### DockerWorkspace + +**Source**: See [workspace/docker documentation](/sdk/architecture/workspace/docker.mdx) + +Executes operations in an isolated Docker container. + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04", + container_name="agent-sandbox" +) + +with workspace: + result = workspace.execute_command("python script.py") +``` + +**Benefits**: +- Strong isolation and sandboxing +- Reproducible environments +- Resource limits and security +- Clean slate for each session + +### RemoteAPIWorkspace + +**Source**: See [workspace/remote_api documentation](/sdk/architecture/workspace/remote_api.mdx) + +Connects to a remote agent server via API. + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("npm test") +``` + +**Benefits**: +- Centralized agent execution +- Shared resources and caching +- Scalable architecture +- Remote monitoring and logging + +## Result Models + +**Source**: [`openhands/sdk/workspace/models.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/models.py) + +### CommandResult + +```python +class CommandResult(BaseModel): + stdout: str # Standard output + stderr: str # Standard error + exit_code: int # Exit code (0 = success) + duration: float # Execution time in seconds +``` + +### FileOperationResult + +```python +class FileOperationResult(BaseModel): + success: bool # Operation success status + message: str # Status message + path: str # File path +``` + +## Usage with Agents + +Workspaces integrate with agents through tools: + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from openhands.sdk.workspace import LocalWorkspace + +# Create workspace +workspace = LocalWorkspace(working_dir="/project") + +# Create tools with workspace +tools = [ + BashTool.create(working_dir=workspace.working_dir), + FileEditorTool.create() +] + +# Create agent +agent = Agent(llm=llm, tools=tools) +``` + +## Local vs Remote Comparison + +| Feature | LocalWorkspace | RemoteWorkspace | +|---------|---------------|-----------------| +| **Execution** | Local machine | Remote server/container | +| **Isolation** | None | Strong (Docker/API) | +| **Performance** | Fast | Network latency | +| **Security** | Host system | Sandboxed environment | +| **Setup** | Simple | Requires infrastructure | +| **Use Case** | Development | Production/Multi-user | + +## Advanced Usage + +### Custom Workspace Implementation + +```python +from openhands.sdk.workspace import BaseWorkspace +from openhands.sdk.workspace.models import CommandResult, FileOperationResult + +class CloudWorkspace(BaseWorkspace): + working_dir: str + cloud_instance_id: str + + def execute_command( + self, + command: str, + cwd: str | None = None, + timeout: float = 30.0 + ) -> CommandResult: + # Execute on cloud instance + response = self.cloud_api.run_command( + instance_id=self.cloud_instance_id, + command=command + ) + return CommandResult( + stdout=response.stdout, + stderr=response.stderr, + exit_code=response.exit_code, + duration=response.duration + ) + + def file_upload( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + # Upload to cloud storage + ... + + def file_download( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + # Download from cloud storage + ... +``` + +### Error Handling + +```python +from openhands.sdk.workspace import LocalWorkspace + +workspace = LocalWorkspace(working_dir="/project") + +try: + result = workspace.execute_command("risky_command", timeout=60.0) + if result.exit_code != 0: + print(f"Command failed: {result.stderr}") +except TimeoutError: + print("Command timed out") +except Exception as e: + print(f"Execution error: {e}") +``` + +## Best Practices + +1. **Use Context Managers**: Always use `with` statements for proper cleanup +2. **Set Appropriate Timeouts**: Prevent hanging on long-running commands +3. **Validate Working Directory**: Ensure paths exist before operations +4. **Handle Errors**: Check exit codes and handle exceptions +5. **Choose Right Workspace**: Local for development, remote for production +6. **Resource Limits**: Set appropriate resource limits for remote workspaces + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based sandboxing +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Remote agent execution server +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace usage examples diff --git a/sdk/arch/tools/bash.mdx b/sdk/arch/tools/bash.mdx new file mode 100644 index 00000000..3497307c --- /dev/null +++ b/sdk/arch/tools/bash.mdx @@ -0,0 +1,288 @@ +--- +title: BashTool +description: Execute bash commands with persistent session support, timeout control, and environment management. +--- + +BashTool enables agents to execute bash commands in a persistent session with full control over working directory, environment variables, and execution timeout. + +**Source**: [`openhands/tools/execute_bash/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash) + +## Overview + +BashTool provides: +- Persistent bash session across multiple commands +- Environment variable management +- Timeout control for long-running commands +- Working directory configuration +- Support for both local and remote execution + +## Usage + +### Basic Usage + +```python +from openhands.tools import BashTool + +# Create tool +bash_tool = BashTool.create() + +# Use with agent +from openhands.sdk import Agent + +agent = Agent( + llm=llm, + tools=[bash_tool] +) +``` + +### With Configuration + +```python +bash_tool = BashTool.create( + working_dir="/project/path", + timeout=60.0 # 60 seconds +) +``` + +## Action Model + +**Source**: [`openhands/tools/execute_bash/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/definition.py) + +```python +class BashAction(Action): + command: str # Bash command to execute + thought: str = "" # Optional reasoning +``` + +### Example + +```python +from openhands.tools import BashAction + +action = BashAction( + command="ls -la", + thought="List files to understand directory structure" +) +``` + +## Observation Model + +```python +class BashObservation(Observation): + output: str # Command output (stdout + stderr) + exit_code: int # Exit code (0 = success) +``` + +### Example + +```python +# Successful execution +observation = BashObservation( + output="file1.txt\nfile2.py\n", + exit_code=0 +) + +# Failed execution +observation = BashObservation( + output="command not found: invalid_cmd\n", + exit_code=127 +) +``` + +## Features + +### Persistent Session + +Commands execute in the same bash session, preserving: +- Environment variables +- Working directory changes +- Shell state + +```python +# Set environment variable +agent.run("export API_KEY=secret") + +# Use in next command +agent.run("echo $API_KEY") # Outputs: secret +``` + +### Terminal Types + +**Source**: [`openhands/tools/execute_bash/terminal/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash/terminal) + +BashTool supports multiple terminal implementations: + +- **SubprocessTerminal**: Direct subprocess execution (default) +- **TmuxTerminal**: Tmux-based persistent sessions + +### Timeout Control + +Commands automatically timeout after the specified duration: + +```python +bash_tool = BashTool.create(timeout=30.0) # 30 second timeout + +# Long-running command will be terminated +action = BashAction(command="sleep 60") # Timeout after 30s +``` + +### Environment Management + +Set custom environment variables: + +```python +# Via workspace secrets +from openhands.sdk import Conversation + +conversation = Conversation( + agent=agent, + secrets={ + "DATABASE_URL": "postgres://...", + "API_KEY": "secret" + } +) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +## Common Use Cases + +### File Operations + +```python +# Create directory +BashAction(command="mkdir -p /path/to/dir") + +# Copy files +BashAction(command="cp source.txt dest.txt") + +# Find files +BashAction(command="find . -name '*.py'") +``` + +### Build and Test + +```python +# Install dependencies +BashAction(command="pip install -r requirements.txt") + +# Run tests +BashAction(command="pytest tests/") + +# Build project +BashAction(command="npm run build") +``` + +### Git Operations + +```python +# Clone repository +BashAction(command="git clone https://github.com/user/repo.git") + +# Create branch +BashAction(command="git checkout -b feature-branch") + +# Commit changes +BashAction(command='git commit -m "Add feature"') +``` + +### System Information + +```python +# Check disk space +BashAction(command="df -h") + +# List processes +BashAction(command="ps aux") + +# Network information +BashAction(command="ifconfig") +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: Prevent hanging on long commands +2. **Use Absolute Paths**: Or configure working directory explicitly +3. **Check Exit Codes**: Verify command success in agent logic +4. **Escape Special Characters**: Properly quote arguments +5. **Avoid Interactive Commands**: BashTool works best with non-interactive commands +6. **Use Security Analysis**: Enable for sensitive operations + +## Security Considerations + +### Risk Assessment + +BashTool actions have varying risk levels: + +- **LOW**: Read operations (`ls`, `cat`, `grep`) +- **MEDIUM**: Write operations (`touch`, `mkdir`, `echo >`) +- **HIGH**: Destructive operations (`rm -rf`, `sudo`, `chmod`) + +### Enable Security + +```python +from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=[BashTool.create()], + security_analyzer=LLMSecurityAnalyzer(llm=llm), + confirmation_policy=ConfirmOnHighRisk() +) +``` + +### Sandboxing + +Use DockerWorkspace for isolation: + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +conversation = Conversation(agent=agent, workspace=workspace) +``` + +## Error Handling + +### Common Exit Codes + +- `0`: Success +- `1`: General error +- `2`: Misuse of shell builtin +- `126`: Command not executable +- `127`: Command not found +- `130`: Terminated by Ctrl+C +- `137`: Killed by SIGKILL (timeout) + +### Handling Failures + +```python +# Agent can check observation +if observation.exit_code != 0: + # Handle error based on output + if "permission denied" in observation.output.lower(): + # Retry with different approach + pass +``` + +## Implementation Details + +**Source**: [`openhands/tools/execute_bash/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/impl.py) + +The tool uses a terminal interface that: +1. Initializes a persistent bash session +2. Executes commands with timeout support +3. Captures stdout and stderr +4. Returns exit codes +5. Handles session cleanup + +## See Also + +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For file manipulation +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/browser_use.mdx b/sdk/arch/tools/browser_use.mdx new file mode 100644 index 00000000..bd52db73 --- /dev/null +++ b/sdk/arch/tools/browser_use.mdx @@ -0,0 +1,101 @@ +--- +title: BrowserUseTool +description: Web browsing and interaction capabilities powered by browser-use integration. +--- + +BrowserUseTool enables agents to interact with web pages, navigate websites, and extract web content through an integrated browser. + +**Source**: [`openhands/tools/browser_use/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/browser_use) + +## Overview + +BrowserUseTool provides: +- Web page navigation +- Element interaction (click, type, etc.) +- Content extraction +- Screenshot capture +- JavaScript execution + +## Usage + +```python +from openhands.tools import BrowserUseTool + +agent = Agent(llm=llm, tools=[BrowserUseTool.create()]) +``` + +## Features + +### Web Navigation + +- Navigate to URLs +- Follow links +- Browser back/forward +- Page refresh + +### Element Interaction + +- Click elements +- Fill forms +- Submit data +- Select dropdowns + +### Content Extraction + +- Extract text content +- Get element attributes +- Capture screenshots +- Parse structured data + +## Use Cases + +### Web Scraping + +```python +# Navigate to page and extract data +# Agent can use browser to: +# 1. Navigate to target URL +# 2. Wait for content to load +# 3. Extract desired information +# 4. Return structured data +``` + +### Web Testing + +```python +# Test web applications +# Agent can: +# 1. Navigate to application +# 2. Fill out forms +# 3. Click buttons +# 4. Verify expected behavior +``` + +### Research + +```python +# Research information online +# Agent can: +# 1. Search for information +# 2. Navigate search results +# 3. Extract relevant content +# 4. Synthesize findings +``` + +## Integration + +BrowserUseTool is powered by the [browser-use](https://github.com/browser-use/browser-use) library, providing robust web automation capabilities. + +## Best Practices + +1. **Handle Loading**: Wait for page content to load +2. **Error Handling**: Handle navigation and interaction failures +3. **Rate Limiting**: Be respectful of target websites +4. **Security**: Avoid sensitive operations in browser +5. **Timeouts**: Set appropriate timeouts for operations + +## See Also + +- **[browser-use](https://github.com/browser-use/browser-use)** - Underlying browser automation library +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For local command execution +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For processing extracted content diff --git a/sdk/arch/tools/file_editor.mdx b/sdk/arch/tools/file_editor.mdx new file mode 100644 index 00000000..fff65d25 --- /dev/null +++ b/sdk/arch/tools/file_editor.mdx @@ -0,0 +1,338 @@ +--- +title: FileEditorTool +description: Edit files with diff-based operations, undo support, and intelligent line-based modifications. +--- + +FileEditorTool provides powerful file editing capabilities with diff-based operations, undo/redo support, and intelligent line-based modifications. It's designed for precise code and text file manipulation. + +**Source**: [`openhands/tools/file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/file_editor) + +## Overview + +FileEditorTool provides: +- View file contents with line numbers +- Insert, delete, and replace lines +- String-based find-and-replace +- Undo/redo support +- Automatic diff generation +- File history tracking + +## Usage + +```python +from openhands.tools import FileEditorTool + +# Create tool +file_editor = FileEditorTool.create() + +# Use with agent +agent = Agent(llm=llm, tools=[file_editor]) +``` + +## Available Commands + +**Source**: [`openhands/tools/file_editor/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/definition.py) + +### view +View file contents with line numbers. + +```python +FileEditAction( + command="view", + path="script.py" +) +``` + +Optional parameters: +- `view_range=[start, end]`: View specific line range + +### create +Create a new file with content. + +```python +FileEditAction( + command="create", + path="newfile.py", + file_text="print('Hello, World!')\n" +) +``` + +### str_replace +Replace a string in the file. + +```python +FileEditAction( + command="str_replace", + path="script.py", + old_str="old_function()", + new_str="new_function()" +) +``` + +### insert +Insert text after a specific line. + +```python +FileEditAction( + command="insert", + path="script.py", + insert_line=10, + new_str=" # New code here\n" +) +``` + +### undo_edit +Undo the last edit operation. + +```python +FileEditAction( + command="undo_edit", + path="script.py" +) +``` + +## Action Model + +```python +class FileEditAction(Action): + command: Literal["view", "create", "str_replace", "insert", "undo_edit"] + path: str # File path + file_text: str | None = None # For create + old_str: str | None = None # For str_replace + new_str: str | None = None # For str_replace/insert + insert_line: int | None = None # For insert + view_range: list[int] | None = None # For view +``` + +## Observation Model + +```python +class FileEditObservation(Observation): + content: str # Result message or file content + success: bool # Operation success status + diff: str | None = None # Unified diff for changes +``` + +## Features + +### Diff Generation + +Automatic diff generation for all modifications: + +```python +# After edit +observation = FileEditObservation( + content="File edited successfully", + success=True, + diff=""" +--- script.py ++++ script.py +@@ -1,3 +1,3 @@ + def main(): +- print("old") ++ print("new") +""" +) +``` + +### Edit History + +Track file modification history with undo support: + +```python +# Edit file +action1 = FileEditAction(command="str_replace", path="file.py", ...) + +# Make another edit +action2 = FileEditAction(command="insert", path="file.py", ...) + +# Undo last edit +action3 = FileEditAction(command="undo_edit", path="file.py") +``` + +### Line-Based Operations + +All operations work with line numbers for precision: + +```python +# View specific lines +FileEditAction( + command="view", + path="large_file.py", + view_range=[100, 150] # View lines 100-150 +) + +# Insert at specific line +FileEditAction( + command="insert", + path="script.py", + insert_line=25, + new_str=" new_code()\n" +) +``` + +### String Replacement + +Find and replace with exact matching: + +```python +# Must match exactly including whitespace +FileEditAction( + command="str_replace", + path="config.py", + old_str="DEBUG = False\nLOG_LEVEL = 'INFO'", + new_str="DEBUG = True\nLOG_LEVEL = 'DEBUG'" +) +``` + +## Common Use Cases + +### Creating Files + +```python +# Create Python script +FileEditAction( + command="create", + path="hello.py", + file_text="#!/usr/bin/env python3\nprint('Hello, World!')\n" +) + +# Create configuration file +FileEditAction( + command="create", + path="config.json", + file_text='{"setting": "value"}\n' +) +``` + +### Viewing Files + +```python +# View entire file +FileEditAction(command="view", path="README.md") + +# View specific section +FileEditAction( + command="view", + path="large_file.py", + view_range=[1, 50] +) + +# View end of file +FileEditAction( + command="view", + path="log.txt", + view_range=[-20, -1] # Last 20 lines +) +``` + +### Refactoring Code + +```python +# Rename function +FileEditAction( + command="str_replace", + path="module.py", + old_str="def old_name(arg):", + new_str="def new_name(arg):" +) + +# Add import +FileEditAction( + command="insert", + path="script.py", + insert_line=0, + new_str="import numpy as np\n" +) + +# Fix bug +FileEditAction( + command="str_replace", + path="buggy.py", + old_str=" if x = 5:", + new_str=" if x == 5:" +) +``` + +## Best Practices + +1. **View Before Editing**: Always view file content first +2. **Exact String Matching**: Ensure `old_str` matches exactly +3. **Include Context**: Include surrounding lines for uniqueness +4. **Use Line Numbers**: View with line numbers for precise edits +5. **Check Success**: Verify `observation.success` before proceeding +6. **Review Diffs**: Check generated diffs for accuracy +7. **Use Undo Sparingly**: Undo only when necessary + +## Error Handling + +### Common Errors + +```python +# File not found +FileEditObservation( + content="Error: File 'missing.py' not found", + success=False +) + +# String not found +FileEditObservation( + content="Error: old_str not found in file", + success=False +) + +# Multiple matches +FileEditObservation( + content="Error: old_str matched multiple locations", + success=False +) + +# Invalid line number +FileEditObservation( + content="Error: insert_line out of range", + success=False +) +``` + +### Recovery Strategies + +```python +# If string not found, view file first +if not observation.success and "not found" in observation.content: + # View file to understand current content + view_action = FileEditAction(command="view", path=path) +``` + +## Implementation Details + +**Source**: [`openhands/tools/file_editor/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/impl.py) + +The editor maintains: +- **File Cache**: Efficient file content caching +- **Edit History**: Per-file undo stack +- **Diff Engine**: Unified diff generation +- **Encoding Detection**: Automatic encoding handling + +## Configuration + +**Source**: [`openhands/tools/file_editor/utils/config.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/utils/config.py) + +```python +# Constants +MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB +MAX_HISTORY_SIZE = 100 # Max undo operations +``` + +## Security Considerations + +- File operations are restricted to working directory +- No execution of file content +- Safe for user-generated content +- Automatic encoding detection prevents binary file issues + +## See Also + +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For file system operations +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/glob.mdx b/sdk/arch/tools/glob.mdx new file mode 100644 index 00000000..8983d0af --- /dev/null +++ b/sdk/arch/tools/glob.mdx @@ -0,0 +1,89 @@ +--- +title: GlobTool +description: Find files using glob patterns with recursive search and flexible matching. +--- + +GlobTool enables file discovery using glob patterns, supporting recursive search, wildcards, and flexible path matching. + +**Source**: [`openhands/tools/glob/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/glob) + +## Usage + +```python +from openhands.tools import GlobTool + +agent = Agent(llm=llm, tools=[GlobTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/glob/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/glob/definition.py) + +```python +class GlobAction(Action): + pattern: str # Glob pattern (e.g., "**/*.py") +``` + +## Observation Model + +```python +class GlobObservation(Observation): + paths: list[str] # List of matching file paths +``` + +## Pattern Syntax + +- `*`: Match any characters except `/` +- `**`: Match any characters including `/` (recursive) +- `?`: Match single character +- `[abc]`: Match any character in brackets +- `[!abc]`: Match any character not in brackets + +## Examples + +### Find Python Files + +```python +GlobAction(pattern="**/*.py") +# Returns: ["src/main.py", "tests/test_main.py", ...] +``` + +### Find Specific Files + +```python +GlobAction(pattern="**/test_*.py") +# Returns: ["tests/test_api.py", "tests/test_utils.py", ...] +``` + +### Multiple Extensions + +```python +GlobAction(pattern="**/*.{py,js,ts}") +# Returns: ["script.py", "app.js", "types.ts", ...] +``` + +### Current Directory Only + +```python +GlobAction(pattern="*.txt") +# Returns: ["readme.txt", "notes.txt", ...] +``` + +## Common Use Cases + +- **Code Discovery**: `**/*.py` - Find all Python files +- **Test Files**: `**/test_*.py` - Find test files +- **Configuration**: `**/*.{json,yaml,yml}` - Find config files +- **Documentation**: `**/*.md` - Find markdown files + +## Best Practices + +1. **Use Recursive Patterns**: `**/*` for deep searches +2. **Specific Extensions**: Narrow results with extensions +3. **Combine with GrepTool**: Find files, then search content +4. **Check Results**: Handle empty result lists + +## See Also + +- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative file operations diff --git a/sdk/arch/tools/grep.mdx b/sdk/arch/tools/grep.mdx new file mode 100644 index 00000000..bd879318 --- /dev/null +++ b/sdk/arch/tools/grep.mdx @@ -0,0 +1,140 @@ +--- +title: GrepTool +description: Search file contents using regex patterns with context and match highlighting. +--- + +GrepTool enables content search across files using regex patterns, providing context around matches and detailed results. + +**Source**: [`openhands/tools/grep/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/grep) + +## Usage + +```python +from openhands.tools import GrepTool + +agent = Agent(llm=llm, tools=[GrepTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/grep/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/grep/definition.py) + +```python +class GrepAction(Action): + pattern: str # Regex pattern to search + path: str = "." # Directory or file to search + case_sensitive: bool = False # Case sensitivity +``` + +## Observation Model + +```python +class GrepObservation(Observation): + matches: list[dict] # List of matches with context + # Each match contains: + # - file: str - File path + # - line: int - Line number + # - content: str - Matching line +``` + +## Examples + +### Search for Function Definition + +```python +GrepAction( + pattern=r"def\s+\w+\(", + path="src/", + case_sensitive=False +) +# Returns: [ +# {"file": "src/main.py", "line": 10, "content": "def process_data(x):"}, +# ... +# ] +``` + +### Case-Sensitive Search + +```python +GrepAction( + pattern="TODO", + path=".", + case_sensitive=True +) +# Only matches exact case "TODO" +``` + +### Search Specific File + +```python +GrepAction( + pattern="import.*pandas", + path="script.py" +) +``` + +## Pattern Syntax + +Supports Python regex patterns: +- `.`: Any character +- `*`: Zero or more +- `+`: One or more +- `?`: Optional +- `[]`: Character class +- `()`: Group +- `|`: Alternation +- `^`: Line start +- `$`: Line end + +## Common Use Cases + +### Find TODOs + +```python +GrepAction(pattern=r"TODO|FIXME|XXX", path=".") +``` + +### Find Imports + +```python +GrepAction(pattern=r"^import |^from .* import ", path="src/") +``` + +### Find API Keys (for security review) + +```python +GrepAction(pattern=r"api[_-]key|secret|password", path=".") +``` + +### Find Function Calls + +```python +GrepAction(pattern=r"database\.query\(", path=".") +``` + +## Best Practices + +1. **Escape Special Characters**: Use `\` for regex special chars +2. **Use Anchors**: `^` and `$` for line boundaries +3. **Case Insensitive Default**: Unless exact case matters +4. **Narrow Search Paths**: Search specific directories +5. **Combine with GlobTool**: Find files first, then grep + +## Workflow Pattern + +```python +# 1. Find relevant files +glob_action = GlobAction(pattern="**/*.py") + +# 2. Search content in those files +grep_action = GrepAction( + pattern="class.*Exception", + path="src/" +) +``` + +## See Also + +- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - View/edit files +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative with `grep` command diff --git a/sdk/arch/tools/overview.mdx b/sdk/arch/tools/overview.mdx new file mode 100644 index 00000000..aadf3f01 --- /dev/null +++ b/sdk/arch/tools/overview.mdx @@ -0,0 +1,185 @@ +--- +title: Tools Overview +description: Pre-built tools for common agent operations including bash execution, file editing, and code search. +--- + +The `openhands.tools` package provides a collection of pre-built, production-ready tools for common agent operations. These tools enable agents to interact with files, execute commands, search code, and manage tasks. + +**Source**: [`openhands/tools/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools) + +## Available Tools + +### Core Tools + +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Execute bash commands with timeout and environment support +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Edit files with diff-based operations and undo support +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning workflows + +### Search Tools + +- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files using glob patterns +- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents with regex support + +### Specialized Tools + +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Track and manage agent tasks +- **[BrowserUseTool](/sdk/architecture/tools/browser_use.mdx)** - Web browsing and interaction + +## Quick Start + +### Using Individual Tools + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[ + BashTool.create(), + FileEditorTool.create() + ] +) +``` + +### Using Tool Presets + +```python +from openhands.tools.preset import get_default_tools, get_planning_tools + +# Default toolset for general tasks +default_tools = get_default_tools() + +# Specialized toolset for planning workflows +planning_tools = get_planning_tools() + +agent = Agent(llm=llm, tools=default_tools) +``` + +## Tool Structure + +All tools follow a consistent structure: + +```mermaid +graph TD + Tool[Tool Definition] --> Action[Action Model] + Tool --> Observation[Observation Model] + Tool --> Executor[Executor Implementation] + + Action --> Params[Input Parameters] + Observation --> Result[Output Data] + Executor --> Execute[execute() method] + + style Tool fill:#e1f5fe + style Action fill:#fff3e0 + style Observation fill:#e8f5e8 + style Executor fill:#f3e5f5 +``` + +### Tool Components + +1. **Action**: Input model defining tool parameters +2. **Observation**: Output model containing execution results +3. **Executor**: Implementation that executes the tool logic + +## Tool Presets + +**Source**: [`openhands/tools/preset/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/preset) + +### Default Preset + +**Source**: [`openhands/tools/preset/default.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/default.py) + +General-purpose toolset for most tasks: + +```python +from openhands.tools.preset import get_default_tools + +tools = get_default_tools() +# Includes: BashTool, FileEditorTool, GlobTool, GrepTool +``` + +### Planning Preset + +**Source**: [`openhands/tools/preset/planning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py) + +Optimized for planning and multi-file workflows: + +```python +from openhands.tools.preset import get_planning_tools + +tools = get_planning_tools() +# Includes: BashTool, PlanningFileEditorTool, GlobTool, GrepTool, TaskTrackerTool +``` + +## Creating Custom Tools + +See the [Tool Definition Guide](/sdk/architecture/sdk/tool.mdx) for creating custom tools. + +## Tool Security + +Tools support security risk assessment: + +```python +from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()], + security_analyzer=LLMSecurityAnalyzer(llm=llm), + confirmation_policy=ConfirmOnHighRisk() +) +``` + +See [Security Documentation](/sdk/architecture/sdk/security.mdx) for more details. + +## Tool Configuration + +### Working Directory + +Most tools operate relative to a working directory: + +```python +from openhands.tools import BashTool + +bash_tool = BashTool.create(working_dir="/project/path") +``` + +### Timeout Settings + +Configure execution timeouts: + +```python +from openhands.tools import BashTool + +bash_tool = BashTool.create(timeout=60.0) # 60 seconds +``` + +## Best Practices + +1. **Use Presets**: Start with tool presets for common workflows +2. **Configure Timeouts**: Set appropriate timeouts for tools +3. **Provide Context**: Use working directories effectively +4. **Enable Security**: Add security analysis for sensitive operations +5. **Filter Tools**: Use `filter_tools_regex` to limit available tools +6. **Test Locally**: Verify tools work in your environment + +## Tool Examples + +Each tool has comprehensive examples: + +- **[Bash Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Command execution +- **[File Editor Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - File manipulation +- **[Planning Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Planning workflows +- **[Task Tracker Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Task management + +## See Also + +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents +- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Complete examples diff --git a/sdk/arch/tools/planning_file_editor.mdx b/sdk/arch/tools/planning_file_editor.mdx new file mode 100644 index 00000000..e176c93b --- /dev/null +++ b/sdk/arch/tools/planning_file_editor.mdx @@ -0,0 +1,128 @@ +--- +title: PlanningFileEditorTool +description: Multi-file editing tool optimized for planning workflows with batch operations. +--- + +PlanningFileEditorTool extends FileEditorTool with multi-file editing capabilities optimized for planning agent workflows. + +**Source**: [`openhands/tools/planning_file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/planning_file_editor) + +## Overview + +PlanningFileEditorTool provides: +- All FileEditorTool capabilities +- Optimized for planning workflows +- Batch file operations +- Coordination with TaskTrackerTool + +## Usage + +```python +from openhands.tools import PlanningFileEditorTool + +agent = Agent(llm=llm, tools=[PlanningFileEditorTool.create()]) +``` + +## Relation to FileEditorTool + +PlanningFileEditorTool inherits all FileEditorTool commands: +- `view`: View file contents +- `create`: Create new files +- `str_replace`: Replace strings +- `insert`: Insert lines +- `undo_edit`: Undo changes + +See [FileEditorTool](/sdk/architecture/tools/file_editor.mdx) for detailed command documentation. + +## Planning Workflow Integration + +```mermaid +graph TD + Plan[Create Task Plan] --> TaskTracker[TaskTrackerTool] + TaskTracker --> Edit[Edit Files] + Edit --> PlanningEditor[PlanningFileEditorTool] + PlanningEditor --> UpdateTasks[Update Task Status] + UpdateTasks --> TaskTracker + + style Plan fill:#fff3e0 + style Edit fill:#e1f5fe + style UpdateTasks fill:#e8f5e8 +``` + +## Usage in Planning Workflows + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): + +```python +from openhands.tools.preset import get_planning_tools + +# Get planning toolset (includes PlanningFileEditorTool) +tools = get_planning_tools() + +agent = Agent(llm=llm, tools=tools) +``` + +## Multi-File Workflow Example + +```python +# 1. Plan tasks +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Create config file", status="todo"), + Task(title="Create main script", status="todo"), + Task(title="Create tests", status="todo") + ] +) + +# 2. Create files +PlanningFileEditAction( + command="create", + path="config.yaml", + file_text="settings:\n debug: true\n" +) + +PlanningFileEditAction( + command="create", + path="main.py", + file_text="import yaml\n\nif __name__ == '__main__':\n pass\n" +) + +# 3. Update task status +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Create config file", status="done"), + Task(title="Create main script", status="done"), + Task(title="Create tests", status="in_progress") + ] +) +``` + +## Best Practices + +1. **Use with TaskTrackerTool**: Coordinate file edits with task status +2. **Plan Before Editing**: Create task plan first +3. **Update Progress**: Mark tasks complete after edits +4. **Follow Workflow**: Plan → Edit → Update → Repeat +5. **Use Planning Preset**: Get all planning tools together + +## When to Use + +Use PlanningFileEditorTool when: +- Building complex multi-file projects +- Following structured planning workflows +- Coordinating with task tracking +- Need agent to manage implementation phases + +Use regular FileEditorTool for: +- Simple file editing tasks +- Single-file modifications +- Ad-hoc editing without planning + +## See Also + +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Base file editing capabilities +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Task management +- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Complete planning toolset +- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Full workflow example diff --git a/sdk/arch/tools/task_tracker.mdx b/sdk/arch/tools/task_tracker.mdx new file mode 100644 index 00000000..73966ef4 --- /dev/null +++ b/sdk/arch/tools/task_tracker.mdx @@ -0,0 +1,146 @@ +--- +title: TaskTrackerTool +description: Track and manage agent tasks with status updates and structured task lists. +--- + +TaskTrackerTool enables agents to create, update, and manage task lists for complex multi-step workflows. + +**Source**: [`openhands/tools/task_tracker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/task_tracker) + +## Usage + +```python +from openhands.tools import TaskTrackerTool + +agent = Agent(llm=llm, tools=[TaskTrackerTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/task_tracker/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/task_tracker/definition.py) + +```python +class TaskTrackerAction(Action): + command: Literal["view", "plan"] + task_list: list[Task] | None = None # For plan command +``` + +### Task Model + +```python +class Task: + title: str # Task title + status: Literal["todo", "in_progress", "done"] # Task status + notes: str | None = None # Optional notes +``` + +## Observation Model + +```python +class TaskTrackerObservation(Observation): + task_list: list[Task] # Current task list + message: str # Status message +``` + +## Commands + +### view +View current task list. + +```python +TaskTrackerAction(command="view") +``` + +### plan +Create or update task list. + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Setup environment", status="done"), + Task(title="Write code", status="in_progress"), + Task(title="Run tests", status="todo") + ] +) +``` + +## Usage Patterns + +### Initialize Task List + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Analyze requirements", status="todo"), + Task(title="Design solution", status="todo"), + Task(title="Implement features", status="todo"), + Task(title="Write tests", status="todo"), + Task(title="Deploy", status="todo") + ] +) +``` + +### Update Progress + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Analyze requirements", status="done"), + Task(title="Design solution", status="in_progress"), + Task(title="Implement features", status="todo"), + Task(title="Write tests", status="todo"), + Task(title="Deploy", status="todo") + ] +) +``` + +### Check Current Status + +```python +TaskTrackerAction(command="view") +# Returns current task list with status +``` + +## Best Practices + +1. **Plan Early**: Create task list at workflow start +2. **Update Regularly**: Mark tasks as progress happens +3. **Use Notes**: Add details for complex tasks +4. **One Task Active**: Focus on one "in_progress" task +5. **Mark Complete**: Set "done" when finished + +## Task Status Workflow + +```mermaid +graph LR + TODO[todo] -->|Start work| PROGRESS[in_progress] + PROGRESS -->|Complete| DONE[done] + DONE -->|Reopen if needed| TODO + + style TODO fill:#fff3e0 + style PROGRESS fill:#e1f5fe + style DONE fill:#c8e6c9 +``` + +## Example: Planning Agent + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): + +```python +# Planning agent uses TaskTrackerTool for workflow management +from openhands.tools.preset import get_planning_tools + +agent = Agent( + llm=llm, + tools=get_planning_tools() # Includes TaskTrackerTool +) +``` + +## See Also + +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning +- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Planning toolset +- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Complete workflow diff --git a/sdk/arch/workspace/docker.mdx b/sdk/arch/workspace/docker.mdx new file mode 100644 index 00000000..4c26fd52 --- /dev/null +++ b/sdk/arch/workspace/docker.mdx @@ -0,0 +1,330 @@ +--- +title: DockerWorkspace +description: Execute agent operations in isolated Docker containers with automatic container lifecycle management. +--- + +DockerWorkspace provides isolated execution environments using Docker containers. It automatically manages container lifecycle, networking, and resource allocation. + +**Source**: [`openhands/workspace/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/docker) + +## Overview + +DockerWorkspace provides: +- Automatic container creation and cleanup +- Network isolation and port management +- Custom or pre-built Docker images +- Environment variable forwarding +- File system mounting +- Resource limits and controls + +## Usage + +### Basic Usage + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + base_image="python:3.12" +) + +with workspace: + result = workspace.execute_command("python --version") + print(result.stdout) # Python 3.12.x +``` + +### With Pre-built Image + +```python +workspace = DockerWorkspace( + working_dir="/workspace", + server_image="ghcr.io/all-hands-ai/agent-server:latest" +) +``` + +## Configuration + +**Source**: [`openhands/workspace/docker/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/docker/workspace.py) + +### Core Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `working_dir` | `str` | `"/workspace"` | Working directory in container | +| `base_image` | `str \| None` | `None` | Base image to build agent server from | +| `server_image` | `str \| None` | `None` | Pre-built agent server image | +| `host_port` | `int \| None` | `None` | Host port to bind (auto-assigned if None) | +| `forward_env` | `list[str]` | `["DEBUG"]` | Environment variables to forward | +| `container_name` | `str \| None` | `None` | Container name (auto-generated if None) | +| `platform` | `str \| None` | `None` | Target platform (e.g., "linux/amd64") | + +### Using Base Image + +Build agent server on top of custom base image: + +```python +workspace = DockerWorkspace( + base_image="ubuntu:22.04", + working_dir="/workspace" +) +``` + +Agent server components are installed on top of the base image. + +### Using Pre-built Server Image + +Use pre-built agent server image: + +```python +workspace = DockerWorkspace( + server_image="ghcr.io/all-hands-ai/agent-server:latest", + working_dir="/workspace" +) +``` + +Faster startup, no build time required. + +## Lifecycle Management + +### Automatic Cleanup + +```python +with DockerWorkspace(base_image="python:3.12") as workspace: + # Container created + workspace.execute_command("pip install requests") + # Commands execute in container +# Container automatically stopped and removed +``` + +### Manual Management + +```python +workspace = DockerWorkspace(base_image="python:3.12") + +# Manually start (happens automatically in context manager) +# Use workspace +result = workspace.execute_command("ls") + +# Manually cleanup +workspace.__exit__(None, None, None) +``` + +## Environment Configuration + +### Forward Environment Variables + +```python +import os + +os.environ["DATABASE_URL"] = "postgres://..." +os.environ["API_KEY"] = "secret" + +workspace = DockerWorkspace( + base_image="python:3.12", + forward_env=["DATABASE_URL", "API_KEY", "DEBUG"] +) + +with workspace: + result = workspace.execute_command("echo $DATABASE_URL") + # Outputs: postgres://... +``` + +### Custom Container Name + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + container_name="my-agent-container" +) +``` + +Useful for debugging and monitoring. + +### Platform Specification + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + platform="linux/amd64" # Force specific platform +) +``` + +Useful for Apple Silicon Macs running amd64 images. + +## Port Management + +DockerWorkspace automatically finds available ports for container communication: + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=None # Auto-assign (default) +) + +# Or specify explicit port +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=8000 # Use specific port +) +``` + +## File Operations + +### File Upload + +```python +workspace.file_upload( + source_path="local_file.txt", + destination_path="/workspace/file.txt" +) +``` + +### File Download + +```python +workspace.file_download( + source_path="/workspace/output.txt", + destination_path="local_output.txt" +) +``` + +## Building Docker Images + +DockerWorkspace can build custom agent server images: + +**Source**: [`openhands/agent_server/docker/build.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/agent_server/docker/build.py) + +```python +from openhands.agent_server.docker.build import ( + BuildOptions, + build +) + +# Build custom image +image_name = build( + BuildOptions( + base_image="ubuntu:22.04", + target="runtime", # or "dev" + platform="linux/amd64", + context_dir="." + ) +) + +# Use built image +workspace = DockerWorkspace(server_image=image_name) +``` + +## Use with Conversation + +```python +from openhands.sdk import Agent, Conversation +from openhands.tools import BashTool, FileEditorTool +from openhands.workspace import DockerWorkspace + +# Create workspace +workspace = DockerWorkspace( + base_image="python:3.12", + working_dir="/workspace" +) + +# Create agent +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Use in conversation +with workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Create a Python web scraper") + conversation.run() +``` + +See [`examples/02_remote_agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server) for complete examples. + +## Security Benefits + +### Isolation + +- **Process Isolation**: Container runs separately from host +- **File System Isolation**: Limited access to host file system +- **Network Isolation**: Separate network namespace + +### Resource Limits + +```python +# Resource limits are configurable via Docker +# Set through Docker API or Dockerfile +``` + +### Sandboxing + +DockerWorkspace provides strong sandboxing: +- Agent cannot access host file system +- Agent cannot interfere with host processes +- Agent operates in controlled environment + +## Performance Considerations + +### Container Startup Time + +- **Base Image Build**: 30-60 seconds (first time) +- **Pre-built Image**: 5-10 seconds +- **Subsequent Runs**: Uses cached images + +### Optimization Tips + +1. **Use Pre-built Images**: Faster than building from base image +2. **Cache Base Images**: Docker caches layers +3. **Minimize Image Size**: Smaller images start faster +4. **Reuse Containers**: For multiple operations (advanced) + +## Troubleshooting + +### Container Fails to Start + +```python +# Check Docker is running +docker ps + +# Check logs +docker logs + +# Verify image exists +docker images +``` + +### Port Already in Use + +```python +# Specify different port +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=8001 # Use alternative port +) +``` + +### Permission Issues + +```python +# Ensure Docker has necessary permissions +# On Linux, add user to docker group: +# sudo usermod -aG docker $USER +``` + +## Best Practices + +1. **Use Context Managers**: Always use `with` statement +2. **Pre-build Images**: Build agent server images ahead of time +3. **Set Resource Limits**: Configure appropriate limits +4. **Monitor Containers**: Track resource usage +5. **Clean Up**: Ensure containers are removed after use +6. **Use Specific Tags**: Pin image versions for reproducibility + +## See Also + +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server running in container +- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Docker workspace examples diff --git a/sdk/arch/workspace/overview.mdx b/sdk/arch/workspace/overview.mdx new file mode 100644 index 00000000..6a539776 --- /dev/null +++ b/sdk/arch/workspace/overview.mdx @@ -0,0 +1,99 @@ +--- +title: Workspace Package Overview +description: Advanced workspace implementations providing sandboxed and remote execution environments. +--- + +The `openhands.workspace` package provides advanced workspace implementations for production deployments, including Docker-based sandboxing and remote API execution. + +**Source**: [`openhands/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace) + +## Available Workspaces + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker container isolation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote server execution + +## Workspace Hierarchy + +```mermaid +graph TD + Base[BaseWorkspace] --> Local[LocalWorkspace] + Base --> Remote[RemoteWorkspace] + Remote --> Docker[DockerWorkspace] + Remote --> API[RemoteAPIWorkspace] + + style Base fill:#e1f5fe + style Local fill:#e8f5e8 + style Remote fill:#fff3e0 + style Docker fill:#f3e5f5 + style API fill:#f3e5f5 +``` + +- **BaseWorkspace**: Core interface (in SDK) +- **LocalWorkspace**: Direct local execution (in SDK) +- **RemoteWorkspace**: Base for remote implementations +- **DockerWorkspace**: Docker container execution +- **RemoteAPIWorkspace**: API-based remote execution + +## Comparison + +| Feature | LocalWorkspace | DockerWorkspace | RemoteAPIWorkspace | +|---------|---------------|-----------------|-------------------| +| **Isolation** | None | Strong | Strong | +| **Performance** | Fast | Good | Network latency | +| **Setup** | None | Docker required | Server required | +| **Security** | Host system | Sandboxed | Sandboxed | +| **Use Case** | Development | Production/Testing | Distributed systems | + +## Quick Start + +### Docker Workspace + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +with workspace: + result = workspace.execute_command("echo 'Hello from Docker'") + print(result.stdout) +``` + +### Remote API Workspace + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("python script.py") + print(result.stdout) +``` + +## Use Cases + +### Development +Use `LocalWorkspace` for local development and testing. + +### Testing +Use `DockerWorkspace` for isolated test environments. + +### Production +Use `DockerWorkspace` or `RemoteAPIWorkspace` for production deployments. + +### Multi-User Systems +Use `RemoteAPIWorkspace` with centralized agent server. + +## See Also + +- **[SDK Workspace Interface](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker implementation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote API implementation +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server for remote workspaces diff --git a/sdk/arch/workspace/remote_api.mdx b/sdk/arch/workspace/remote_api.mdx new file mode 100644 index 00000000..cb8ca8a4 --- /dev/null +++ b/sdk/arch/workspace/remote_api.mdx @@ -0,0 +1,325 @@ +--- +title: RemoteAPIWorkspace +description: Connect to centralized agent servers via HTTP API for scalable distributed agent execution. +--- + +RemoteAPIWorkspace enables agent execution on remote servers through HTTP APIs. It's designed for production deployments requiring centralized agent management and multi-user support. + +**Source**: [`openhands/workspace/remote_api/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/remote_api) + +## Overview + +RemoteAPIWorkspace provides: +- HTTP API communication with agent server +- Authentication and authorization +- Centralized resource management +- Multi-user agent execution +- Monitoring and logging + +## Usage + +### Basic Usage + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("python script.py") + print(result.stdout) +``` + +### With Agent + +```python +from openhands.sdk import Agent, Conversation +from openhands.tools import BashTool, FileEditorTool + +# Create workspace +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +# Create agent +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Use in conversation +conversation = Conversation(agent=agent, workspace=workspace) +conversation.send_message("Your task") +conversation.run() +``` + +## Configuration + +**Source**: [`openhands/workspace/remote_api/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/remote_api/workspace.py) + +### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `working_dir` | `str` | Yes | Working directory on server | +| `api_url` | `str` | Yes | Agent server API URL | +| `api_key` | `str` | Yes | Authentication API key | +| `timeout` | `float` | No | Request timeout (default: 30) | + +### Example Configuration + +```python +workspace = RemoteAPIWorkspace( + working_dir="/workspace/user123", + api_url="https://agents.company.com", + api_key="sk-abc123...", + timeout=60.0 # 60 second timeout +) +``` + +## API Communication + +### HTTP Endpoints + +RemoteAPIWorkspace communicates with agent server endpoints: + +- `POST /api/workspace/command` - Execute commands +- `POST /api/workspace/upload` - Upload files +- `GET /api/workspace/download` - Download files +- `GET /api/health` - Health check + +### Authentication + +```python +# API key passed in Authorization header +headers = { + "Authorization": f"Bearer {api_key}" +} +``` + +### Error Handling + +```python +try: + result = workspace.execute_command("command") +except ConnectionError: + print("Failed to connect to agent server") +except TimeoutError: + print("Request timed out") +except Exception as e: + print(f"Execution error: {e}") +``` + +## File Operations + +### Upload Files + +```python +workspace.file_upload( + source_path="local_data.csv", + destination_path="/workspace/data.csv" +) +``` + +### Download Files + +```python +workspace.file_download( + source_path="/workspace/results.json", + destination_path="local_results.json" +) +``` + +### Large File Transfer + +```python +# Chunked upload for large files +workspace.file_upload( + source_path="large_dataset.zip", + destination_path="/workspace/dataset.zip" +) +``` + +## Architecture + +```mermaid +graph LR + Client[Client SDK] -->|HTTPS| API[Agent Server API] + API --> Container1[Container 1] + API --> Container2[Container 2] + API --> Container3[Container 3] + + Container1 --> Agent1[Agent] + Container2 --> Agent2[Agent] + Container3 --> Agent3[Agent] + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style Container1 fill:#e8f5e8 + style Container2 fill:#e8f5e8 + style Container3 fill:#e8f5e8 +``` + +## Use Cases + +### Multi-User Platform + +```python +# Each user gets isolated workspace +user_workspace = RemoteAPIWorkspace( + working_dir=f"/workspace/{user_id}", + api_url="https://agents.platform.com", + api_key=user_api_key +) +``` + +### Scalable Agent Execution + +```python +# Server manages resource allocation +# Multiple agents run concurrently +# Automatic load balancing +``` + +### Centralized Monitoring + +```python +# Server tracks: +# - Resource usage per user +# - Agent execution logs +# - API usage metrics +# - Error rates and debugging info +``` + +## Security + +### Authentication + +- API key-based authentication +- Per-user access control +- Token expiration and rotation + +### Isolation + +- Separate workspaces per user +- Container-based sandboxing +- Network isolation + +### Data Protection + +- HTTPS communication +- Encrypted data transfer +- Secure file storage + +## Performance Considerations + +### Network Latency + +```python +# Latency depends on: +# - Network connection +# - Geographic distance +# - Server load + +# Optimization: +# - Use regional servers +# - Batch operations +# - Cache frequently accessed data +``` + +### Concurrent Execution + +```python +# Server handles concurrent requests +# Multiple users can run agents simultaneously +# Automatic resource management +``` + +## Deployment + +### Running Agent Server + +See [Agent Server Documentation](/sdk/architecture/agent_server/overview.mdx) for server setup: + +```bash +# Start agent server +docker run -d \ + -p 8000:8000 \ + -e API_KEY=your-secret-key \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +### Using Deployed Server + +```python +# Client connects to deployed server +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://your-server.com", + api_key="your-secret-key" +) +``` + +## Comparison with DockerWorkspace + +| Feature | DockerWorkspace | RemoteAPIWorkspace | +|---------|-----------------|-------------------| +| **Setup** | Local Docker | Remote server | +| **Network** | Local | Internet required | +| **Scaling** | Single machine | Multiple users | +| **Management** | Client-side | Server-side | +| **Latency** | Low | Network dependent | +| **Use Case** | Local dev/test | Production | + +## Best Practices + +1. **Use HTTPS**: Always use secure connections +2. **Rotate API Keys**: Regularly update authentication +3. **Set Timeouts**: Configure appropriate timeouts +4. **Handle Network Errors**: Implement retry logic +5. **Monitor Usage**: Track API calls and resource usage +6. **Regional Deployment**: Use nearby servers for lower latency +7. **Batch Operations**: Combine multiple operations when possible + +## Troubleshooting + +### Connection Failures + +```python +# Verify server is reachable +import requests +response = requests.get(f"{api_url}/api/health") +print(response.status_code) # Should be 200 +``` + +### Authentication Errors + +```python +# Verify API key is correct +# Check key has not expired +# Ensure proper authorization headers +``` + +### Timeout Issues + +```python +# Increase timeout for long operations +workspace = RemoteAPIWorkspace( + api_url=api_url, + api_key=api_key, + timeout=120.0 # 2 minutes +) +``` + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Local Docker execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server implementation +- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace examples diff --git a/sdk/guides/activate-skill.mdx b/sdk/guides/activate-skill.mdx new file mode 100644 index 00000000..ca973de0 --- /dev/null +++ b/sdk/guides/activate-skill.mdx @@ -0,0 +1,178 @@ +--- +title: Skills +description: Skills add specialized behaviors, domain knowledge, and context-aware triggers to your agent through structured prompts. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) + + +Skills modify agent behavior by injecting additional context and rules. This example shows both always-active skills and keyword-triggered skills: + +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), +] + +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + system_message_suffix="Always finish your response with the word 'yay!'", + user_message_suffix="The first character of your response should be 'I'", +) + + +# Agent +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") +conversation.run() + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/03_activate_skill.py +``` + +### Creating Skills + +Skills are defined with a name, content (the instructions), and an optional trigger: + +```python highlight={2-5,8-10} +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` + +### Keyword Triggers + +Use `KeywordTrigger` to activate skills only when specific words appear: + +```python highlight={3} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` + +### Message Suffixes + +Add consistent prefixes or suffixes to system and user messages: + +```python highlight={2-3} +agent_context = AgentContext( + skills=[...], + system_message_suffix="Always finish your response with the word 'yay!'", + user_message_suffix="The first character of your response should be 'I'", +) +``` + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Add execution approval diff --git a/sdk/guides/async.mdx b/sdk/guides/async.mdx new file mode 100644 index 00000000..b416e30a --- /dev/null +++ b/sdk/guides/async.mdx @@ -0,0 +1,149 @@ +--- +title: Async Operations +description: Use async/await for concurrent agent operations and non-blocking execution. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) + + +Run agents asynchronously for non-blocking execution and concurrent operations: + +```python icon="python" expandable examples/01_standalone_sdk/11_async.py +""" +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop +""" + +import asyncio +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.conversation.types import ConversationCallbackType +from openhands.sdk.tool import Tool, register_tool +from openhands.sdk.utils.async_utils import AsyncCallbackWrapper +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +register_tool("TaskTrackerTool", TaskTrackerTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), + Tool(name="TaskTrackerTool"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +# Callback coroutine +async def callback_coro(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Synchronous run conversation +def run_conversation(callback: ConversationCallbackType): + conversation = Conversation(agent=agent, callbacks=[callback]) + + conversation.send_message( + "Hello! Can you create a new Python file named hello.py that prints " + "'Hello, World!'? Use task tracker to plan your steps." + ) + conversation.run() + + conversation.send_message("Great! Now delete that file.") + conversation.run() + + +async def main(): + loop = asyncio.get_running_loop() + + # Create the callback + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Run the conversation in a background thread and wait for it to finish... + await loop.run_in_executor(None, run_conversation, callback) + + print("=" * 100) + print("Conversation finished. Got the following LLM messages:") + for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/11_async.py +``` + +### Async Streaming + +Use `astream()` to process events as they occur without blocking: + +```python highlight={4-5} +async def run_agent(): + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message("Write 3 facts about Python to FACTS.txt") + + async for event in conversation.astream(): + print(f"Event: {event}") +``` + +### Concurrent Agents + +Run multiple agent tasks in parallel using `asyncio.gather()`: + +```python highlight={4-7} +async def main(): + # Create multiple conversation tasks + tasks = [ + run_task("task 1"), + run_task("task 2"), + run_task("task 3") + ] + results = await asyncio.gather(*tasks) +``` + +## Next Steps + +- **[Persistence](/sdk/guides/persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/send-message-while-processing)** - Interrupt running agents diff --git a/sdk/guides/browser-use.mdx b/sdk/guides/browser-use.mdx new file mode 100644 index 00000000..9db0f953 --- /dev/null +++ b/sdk/guides/browser-use.mdx @@ -0,0 +1,117 @@ +--- +title: Browser Use +description: Enable web browsing and interaction capabilities for your agent. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) + + +Give your agent the ability to navigate websites, click elements, fill forms, and extract content: + +```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +register_tool("BrowserToolSet", BrowserToolSet) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), + Tool(name="BrowserToolSet"), +] + +# If you need fine-grained browser control, you can manually register individual browser +# tools by creating a BrowserToolExecutor and providing factories that return customized +# Tool instances before constructing the Agent. + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Could you go to https://all-hands.dev/ blog page and summarize main " + "points of the latest blog?" +) +conversation.run() + + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/15_browser_use.py +``` + +### Browser Agent + +Use the preset browser agent with built-in browser tools: + +```python highlight={3} +from openhands.tools.preset.browser import get_browser_agent + +agent = get_browser_agent(llm=llm) +conversation = Conversation(agent=agent) +conversation.send_message("Search for OpenHands on GitHub and summarize the README") +``` + +The browser tool enables: +- Web navigation and page loading +- Element clicking and form filling +- Content extraction and screenshots +- Multi-page workflows + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services diff --git a/sdk/guides/confirmation-mode.mdx b/sdk/guides/confirmation-mode.mdx new file mode 100644 index 00000000..fc30185a --- /dev/null +++ b/sdk/guides/confirmation-mode.mdx @@ -0,0 +1,193 @@ +--- +title: Confirmation Mode +description: Require user approval before executing actions for safe agent operation. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + + +Require user approval before executing agent actions for safe operation: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.tools.preset.default import get_default_agent + + +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_action_preview(pending_actions) -> None: + print(f"\nšŸ” Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\nāŒ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("āœ… Approved — executing actions…") + return True + if ans in ("no", "n"): + print("āŒ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if ( + conversation.state.agent_status + == AgentExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "āš ļø Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("ā–¶ļø Running conversation.run()…") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") +agent = get_default_agent(llm=llm, add_security_analyzer=add_security_analyzer) +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) + +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) + +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) + +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() + +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() + +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets agent_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/04_confirmation_mode_example.py +``` + +### Confirmation Policy + +Set the confirmation policy when creating the agent: + +```python highlight={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +agent = Agent(llm=llm, tools=tools, + confirmation_policy=AlwaysConfirm()) +``` + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python highlight={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python highlight={2-4} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky. Please try a safer approach." + ) +``` + +## Next Steps + +- **[Security Analyzer](/sdk/guides/security-analyzer)** - Automated security checks +- **[Custom Secrets](/sdk/guides/custom-secrets)** - Secure credential management diff --git a/sdk/guides/context-condenser.mdx b/sdk/guides/context-condenser.mdx new file mode 100644 index 00000000..69cb621d --- /dev/null +++ b/sdk/guides/context-condenser.mdx @@ -0,0 +1,175 @@ +--- +title: Context Condenser +description: Manage agent memory by condensing conversation history to save tokens. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + + +Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: + +```python icon="python" examples/01_standalone_sdk/14_context_condenser.py +""" +To manage context in long-running conversations, the agent can use a context condenser +that keeps the conversation history within a specified size limit. This example +demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes +older parts of the conversation when the history exceeds a defined threshold. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context.condenser import LLMSummarizingCondenser +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +register_tool("TaskTrackerTool", TaskTrackerTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), + Tool(name="TaskTrackerTool"), +] + +# Create a condenser to manage the context. The condenser will automatically truncate +# conversation history when it exceeds max_size, and replaces the dropped events with an +# LLM-generated summary. This condenser triggers when there are more than ten events in +# the conversation history, and always keeps the first two events (system prompts, +# initial user messages) to preserve important context. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +# Send multiple messages to demonstrate condensation +print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") + +conversation.send_message( + "Hello! Can you create a Python file named math_utils.py with functions for " + "basic arithmetic operations (add, subtract, multiply, divide)?" +) +conversation.run() + +conversation.send_message( + "Great! Now add a function to calculate the factorial of a number." +) +conversation.run() + +conversation.send_message("Add a function to check if a number is prime.") +conversation.run() + +conversation.send_message( + "Add a function to calculate the greatest common divisor (GCD) of two numbers." +) +conversation.run() + +conversation.send_message( + "Now create a test file to verify all these functions work correctly." +) +conversation.run() + + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Finally, clean up by deleting both files.") +conversation.run() + + +print("=" * 100) +print("Conversation finished with LLM Summarizing Condenser.") +print(f"Total LLM messages collected: {len(llm_messages)}") +print("\nThe condenser automatically summarized older conversation history") +print("when the conversation exceeded the configured max_size threshold.") +print("This helps manage context length while preserving important information.") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/14_context_condenser.py +``` + +### Setting Up Condensing + +Configure a condenser when creating the agent: + +```python highlight={3-4} +from openhands.sdk.context import LLMCondenser + +condenser = LLMCondenser(llm=llm, max_context_length=100000) +agent = Agent(llm=llm, tools=tools, condenser=condenser) +``` + +When context exceeds `max_context_length`, the condenser summarizes older messages to reduce token usage while maintaining important information. + +## Next Steps + +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage reduction +- **[Conversation Costs](/sdk/guides/conversation-costs)** - Analyze cost savings diff --git a/sdk/guides/conversation-costs.mdx b/sdk/guides/conversation-costs.mdx new file mode 100644 index 00000000..4b596138 --- /dev/null +++ b/sdk/guides/conversation-costs.mdx @@ -0,0 +1,155 @@ +--- +title: Conversation Costs +description: Analyze and optimize conversation token usage and associated costs. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + + +Analyze token usage and costs to identify expensive operations and optimize budget: + +```python icon="python" examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os + +from pydantic import SecretStr +from tabulate import tabulate + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.registry import register_tool +from openhands.sdk.tool.spec import Tool +from openhands.tools.execute_bash import ( + BashTool, +) + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +register_tool("BashTool", BashTool) + +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name="BashTool", + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +``` + +### Analyzing Costs + +Calculate total costs and per-message costs: + +```python highlight={1-4} +metrics = conversation.get_llm_metrics() +print(f"Total tokens: {metrics.input_tokens + metrics.output_tokens}") +print(f"Total cost: ${metrics.cost:.4f}") +print(f"Cost per message: ${metrics.cost / len(conversation.messages):.4f}") +``` + +Use this for budget control, cost attribution, and optimization. + +## Next Steps + +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track detailed token usage +- **[Context Condenser](/sdk/guides/context-condenser)** - Reduce token usage diff --git a/sdk/guides/custom-secrets.mdx b/sdk/guides/custom-secrets.mdx new file mode 100644 index 00000000..d63bcf7a --- /dev/null +++ b/sdk/guides/custom-secrets.mdx @@ -0,0 +1,104 @@ +--- +title: Custom Secrets +description: Provide environment variables and secrets to agent workspace securely. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) + + +Securely provide environment variables and secrets to your agent's workspace: + +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation.secret_source import SecretSource +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool(name="BashTool"), + Tool(name="FileEditorTool"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + + +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} +) + +conversation.send_message("just echo $SECRET_TOKEN") + +conversation.run() + +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") + +conversation.run() +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +export MY_SECRET="secret-value" +cd agent-sdk +uv run python examples/01_standalone_sdk/12_custom_secrets.py +``` + +### Injecting Secrets + +Pass environment variables to the workspace via `workspace_env`: + +```python highlight={5-8} +workspace_env = { + "API_KEY": os.getenv("API_KEY"), + "DATABASE_URL": os.getenv("DATABASE_URL") +} + +conversation = Conversation( + agent=agent, + workspace=workspace_path, + workspace_env=workspace_env +) +``` + +The agent can then access these variables in tool executions: +```bash +echo $API_KEY +curl -H "Authorization: Bearer $API_KEY" https://api.example.com +``` + +## Next Steps + +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services with OAuth +- **[Security Analyzer](/sdk/guides/security-analyzer)** - Add security validation diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx new file mode 100644 index 00000000..41977f29 --- /dev/null +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -0,0 +1,65 @@ +--- +title: PR Review Workflow +description: Automate pull request reviews with AI-powered code analysis using GitHub Actions. +--- + + +This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) + + +Automatically review pull requests when labeled, providing comprehensive feedback on code quality, security, and best practices. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Create a "review-this" label in your repository +# Go to Issues → Labels → New label +``` + +## Features + +- **Automatic Trigger** - Reviews start when `review-this` label is added +- **Comprehensive Analysis** - Analyzes changes in full repository context +- **Detailed Feedback** - Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR + +## Usage + +### Trigger a Review + +1. Open a pull request +2. Add the `review-this` label +3. Wait for the workflow to complete +4. Review feedback posted as PR comments + +## Configuration + +Edit `.github/workflows/pr-review.yml` to customize: + +```yaml +env: + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 + # LLM_BASE_URL: 'https://custom-api.example.com' # Optional +``` + +## Review Coverage + +The agent analyzes: + +- **Code Quality** - Readability, maintainability, patterns +- **Security** - Potential vulnerabilities and risks +- **Best Practices** - Language and framework conventions +- **Improvements** - Specific actionable suggestions +- **Positive Feedback** - Recognition of good practices + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/prompt.py) diff --git a/sdk/guides/github-workflows/routine-maintenance.mdx b/sdk/guides/github-workflows/routine-maintenance.mdx new file mode 100644 index 00000000..86b42168 --- /dev/null +++ b/sdk/guides/github-workflows/routine-maintenance.mdx @@ -0,0 +1,74 @@ +--- +title: Routine Maintenance Workflow +description: Automate routine maintenance tasks with GitHub Actions and OpenHands agents. +--- + + +This example is available on GitHub: [examples/github_workflows/01_basic_action/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/01_basic_action) + + +Set up automated or scheduled GitHub Actions workflows to handle routine maintenance tasks like dependency updates, documentation improvements, and code cleanup. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/github_workflows/01_basic_action/workflow.yml .github/workflows/maintenance.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Configure the prompt in workflow.yml +# See below for options +``` + +## Configuration + +### Option A: Direct Prompt + +```yaml +env: + PROMPT_STRING: 'Check for outdated dependencies and create a PR to update them' + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 +``` + +### Option B: Remote Prompt + +```yaml +env: + PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 +``` + +## Usage + +### Manual Trigger + +1. Go to **Actions** → "Maintenance Task" +2. Click **Run workflow** +3. Optionally override prompt settings +4. Click **Run workflow** + +### Scheduled Runs + +Uncomment the schedule section in `workflow.yml`: + +```yaml +on: + schedule: + - cron: "0 2 * * *" # Run at 2 AM UTC daily +``` + +## Example Use Cases + +- **Dependency Updates** - Check and update outdated packages +- **Documentation** - Update docs to reflect code changes +- **Test Coverage** - Identify and improve under-tested code +- **Linting** - Apply formatting and linting fixes +- **Link Validation** - Find and report broken links + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/workflow.yml) +- [GitHub Actions Cron Syntax](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule) diff --git a/sdk/guides/image-input.mdx b/sdk/guides/image-input.mdx new file mode 100644 index 00000000..89dce95c --- /dev/null +++ b/sdk/guides/image-input.mdx @@ -0,0 +1,138 @@ +--- +title: Image Input +description: Send images to multimodal agents for vision-based tasks and analysis. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + + +Send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: + +```python icon="python" examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. + +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.registry import register_tool +from openhands.sdk.tool.spec import Tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool + + +logger = get_logger(__name__) + +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() + +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +register_tool("TaskTrackerTool", TaskTrackerTool) + +agent = Agent( + llm=llm, + tools=[ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), + Tool(name="TaskTrackerTool"), + ], +) + +llm_messages = [] # collect raw LLM messages for inspection + + +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" + +conversation.send_message( + Message( + role="user", + vision_enabled=True, + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() + +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() + + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/17_image_input.py +``` + +### Sending Images + +Pass images along with text in the message content: + +```python highlight={3-5} +from openhands.sdk import ImageContent + +conversation.send_message( + content=["Analyze this screenshot", ImageContent(path="/path/to/image.png")] +) +``` + +Works with multimodal LLMs like GPT-4 Vision and Claude with vision capabilities. + +## Next Steps + +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/async)** - Process multiple images concurrently diff --git a/sdk/guides/interactive-terminal.mdx b/sdk/guides/interactive-terminal.mdx new file mode 100644 index 00000000..34dc284b --- /dev/null +++ b/sdk/guides/interactive-terminal.mdx @@ -0,0 +1,126 @@ +--- +title: Streaming & Interactive Terminal +description: Stream events in real-time to display agent progress and reasoning to users. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) + + +Stream agent events in real-time to display progress and reasoning to users: + +```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Enter python interactive mode by directly running `python3`, then tell me " + "the current time, and exit python interactive mode." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +``` + +### Streaming Mode + +Process events as they occur in real-time: + +```python highlight={4-7} +conversation = Conversation(agent=agent) +conversation.send_message("Build a web server") + +for event in conversation.stream(): + if isinstance(event, Action): + print(f"šŸ”§ Action: {event}") + elif isinstance(event, Observation): + print(f"šŸ‘ļø Result: {event}") +``` + +### Single-Turn Mode + +Wait for the agent to complete before continuing (see [Hello World](/sdk/guides/hello-world)): + +```python highlight={3} +conversation = Conversation(agent=agent) +conversation.send_message("Create a Python script") +conversation.run() # Blocks until done +``` + +### Displaying Reasoning + +Show the agent's thought process during execution: + +```python highlight={2-3} +for event in conversation.stream(): + if hasattr(event, 'reasoning') and event.reasoning: + print(f"šŸ’­ {event.reasoning}") + elif isinstance(event, Action): + print(f"šŸ”§ {event.tool_name}") +``` + +## Next Steps + +- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/async)** - Non-blocking streaming diff --git a/sdk/guides/llm-metrics.mdx b/sdk/guides/llm-metrics.mdx new file mode 100644 index 00000000..3bc555c2 --- /dev/null +++ b/sdk/guides/llm-metrics.mdx @@ -0,0 +1,131 @@ +--- +title: LLM Metrics +description: Track token usage, costs, and performance metrics for your agents. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +Track token usage, costs, and performance metrics from LLM interactions: + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool(name="BashTool"), + Tool(name="FileEditorTool"), +] + +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/13_get_llm_metrics.py +``` + +### Getting Metrics + +Access metrics after running the conversation: + +```python highlight={3-6} +conversation.run() + +metrics = conversation.get_llm_metrics() +print(f"Input tokens: {metrics.input_tokens}") +print(f"Output tokens: {metrics.output_tokens}") +print(f"Total cost: ${metrics.cost:.4f}") +``` + +### Tracking Changes Over Time + +Compare metrics between operations: + +```python highlight={1,4} +initial_metrics = conversation.get_llm_metrics() +conversation.send_message("more work") +conversation.run() +final_metrics = conversation.get_llm_metrics() + +tokens_used = final_metrics.input_tokens - initial_metrics.input_tokens +``` + +Metrics include: `input_tokens`, `output_tokens`, `cost`, `api_calls`, and `cache_reads` (if supported). + +## Next Steps + +- **[Conversation Costs](/sdk/guides/conversation-costs)** - Calculate costs per conversation +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing diff --git a/sdk/guides/llm-registry.mdx b/sdk/guides/llm-registry.mdx new file mode 100644 index 00000000..80bba8fd --- /dev/null +++ b/sdk/guides/llm-registry.mdx @@ -0,0 +1,142 @@ +--- +title: LLM Registry +description: Dynamically select and configure language models using the LLM registry. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + +Use the LLM registry to manage multiple LLM providers and dynamically switch between models: + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [Tool(name="BashTool")] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +completion_response = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content +if completion_response.choices and completion_response.choices[0].message: # type: ignore + content = completion_response.choices[0].message.content # type: ignore + print(f"Direct completion response: {content}") +else: + print("No response content available") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +cd agent-sdk +uv run python examples/01_standalone_sdk/05_use_llm_registry.py +``` + +### Using the Registry + +Get pre-configured LLMs from the registry by model name: + +```python highlight={3-6} +from openhands.sdk import get_llm_from_registry + +llm = get_llm_from_registry( + model="openhands/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key) +) +``` + +### Multi-Model Applications + +Use different models for different task complexities: + +```python highlight={1-2,5,8} +cheap_llm = get_llm_from_registry("openhands/gpt-4o-mini", api_key=api_key) +powerful_llm = get_llm_from_registry("openhands/claude-sonnet-4-5-20250929", api_key=api_key) + +# Use cheap model for simple tasks +simple_agent = Agent(llm=cheap_llm, tools=tools) + +# Use powerful model for complex tasks +complex_agent = Agent(llm=powerful_llm, tools=tools) +``` + +## Next Steps + +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and costs diff --git a/sdk/guides/llm-routing.mdx b/sdk/guides/llm-routing.mdx new file mode 100644 index 00000000..8bbe0332 --- /dev/null +++ b/sdk/guides/llm-routing.mdx @@ -0,0 +1,136 @@ +--- +title: LLM Routing +description: Route requests to different LLMs based on task requirements and complexity. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) + + +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: + +```python icon="python" examples/01_standalone_sdk/19_llm_routing.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) + +# Tools +tools = get_default_tools() # Use our default openhands experience + +# Agent +agent = Agent(llm=multimodal_router, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() +) + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], + ) +) +conversation.run() + + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/19_llm_routing.py +``` + +### Setting Up Routing + +Configure routing rules to select models based on task complexity: + +```python highlight={3-6} +from openhands.sdk import LLMRouter + +router = LLMRouter( + default_llm=cheap_llm, + routing_rules={"complex": powerful_llm, "simple": cheap_llm} +) +agent = Agent(llm=router, tools=tools) +``` + +The router automatically selects the appropriate model based on task characteristics, optimizing both cost and performance. + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and costs diff --git a/sdk/guides/model-reasoning.mdx b/sdk/guides/model-reasoning.mdx new file mode 100644 index 00000000..9b1fb2cd --- /dev/null +++ b/sdk/guides/model-reasoning.mdx @@ -0,0 +1,261 @@ +--- +title: Model Reasoning +description: Access model reasoning traces from Anthropic thinking blocks and OpenAI responses API. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/22_model_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py) + + +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. This example demonstrates two approaches: + +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Responses Reasoning** - GPT's reasoning effort parameter + +```python icon="python" expandable examples/01_standalone_sdk/22_model_reasoning.py +""" +Example: Model Reasoning - Anthropic Thinking & OpenAI Responses + +Demonstrates two approaches to accessing model reasoning: +1. Anthropic's extended thinking feature with thinking blocks +2. OpenAI's Responses API with reasoning effort parameter + +Both approaches allow you to see the model's internal reasoning process +for transparency, debugging, and understanding decision-making. +""" + +from __future__ import annotations + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def example_anthropic_thinking(): + """Demonstrate Anthropic Claude's extended thinking with thinking blocks.""" + print("\n" + "=" * 80) + print("EXAMPLE 1: Anthropic Extended Thinking") + print("=" * 80) + + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") + + llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + ) + + # Setup agent with bash tool + register_tool("BashTool", BashTool) + agent = Agent(llm=llm, tools=[Tool(name="BashTool")]) + + # Callback to display thinking blocks + def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + preview = block.thinking[:100] + print(f" Block {i + 1}: {preview}...") + + conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() + ) + + conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", + ) + conversation.run() + + conversation.send_message( + "Now, write that number to ANTHROPIC_RESULT.txt.", + ) + conversation.run() + print("āœ… Anthropic thinking example complete!") + + +def example_openai_responses(): + """Demonstrate OpenAI's Responses API with reasoning effort.""" + print("\n" + "=" * 80) + print("EXAMPLE 2: OpenAI Responses Reasoning") + print("=" * 80) + + api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") + assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + + model = os.getenv("LLM_MODEL", "openhands/gpt-5-codex") + base_url = os.getenv("LLM_BASE_URL") + + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", + ) + + agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity + ) + + llm_messages = [] # collect raw LLM-convertible messages for inspection + + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) + # Show reasoning if available + if hasattr(msg, "reasoning") and msg.reasoning: + preview = str(msg.reasoning)[:100] + print(f"šŸ’­ Reasoning detected: {preview}...") + + conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + ) + + # Keep the tasks short for demo purposes + conversation.send_message("Create a file called OPENAI_RESULT.txt with a fun fact.") + conversation.run() + + conversation.send_message("Now delete OPENAI_RESULT.txt.") + conversation.run() + + print("=" * 80) + print(f"āœ… Collected {len(llm_messages)} LLM messages with reasoning traces") + print("āœ… OpenAI responses example complete!") + + +if __name__ == "__main__": + # Detect which model is being used and run appropriate example + model = os.getenv("LLM_MODEL", "") + + if "claude" in model.lower() or "anthropic" in model.lower(): + print("šŸ” Detected Anthropic model - running thinking blocks example") + example_anthropic_thinking() + elif "gpt" in model.lower() or "openai" in model.lower(): + print("šŸ” Detected OpenAI model - running responses reasoning example") + example_openai_responses() + else: + print("āš ļø Model not specified or unclear. Running both examples...") + print(" Set LLM_MODEL to 'claude-...' or 'gpt-...' to run specific example") + try: + example_anthropic_thinking() + except Exception as e: + print(f"āš ļø Anthropic example failed: {e}") + + try: + example_openai_responses() + except Exception as e: + print(f"āš ļø OpenAI example failed: {e}") + + print("\n" + "=" * 80) + print("āœ… All reasoning examples complete!") + print("=" * 80) +``` + +```bash Running the Example +# For Anthropic Claude +export LLM_API_KEY="your-anthropic-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +cd agent-sdk +uv run python examples/01_standalone_sdk/22_model_reasoning.py + +# For OpenAI GPT +export LLM_API_KEY="your-openai-api-key" +export LLM_MODEL="openhands/gpt-5-codex" +cd agent-sdk +uv run python examples/01_standalone_sdk/22_model_reasoning.py +``` + +## Anthropic Thinking Blocks + +Access Claude's internal thinking process with thinking blocks: + +```python highlight={7-12} +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") + +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` + +Claude uses thinking blocks to reason through complex problems step-by-step, improving accuracy on difficult tasks. + +## OpenAI Responses Reasoning + +Access GPT's reasoning traces with the responses API: + +```python highlight={6} +llm = LLM( + model="gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` + +Then capture reasoning in your callback: + +```python highlight={4-6} +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + if hasattr(msg, "reasoning") and msg.reasoning: + print(f"šŸ’­ Reasoning: {msg.reasoning}") +``` + +## Use Cases + +**Debugging**: Understand why the agent made specific decisions or took certain actions. + +**Transparency**: Show users how the AI arrived at its conclusions. + +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. + +**Learning**: Study how models approach complex problems. + +## Next Steps + +- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities diff --git a/sdk/guides/pause-and-resume.mdx b/sdk/guides/pause-and-resume.mdx new file mode 100644 index 00000000..aaf243e6 --- /dev/null +++ b/sdk/guides/pause-and-resume.mdx @@ -0,0 +1,121 @@ +--- +title: Pause and Resume +description: Pause agent execution, perform operations, and resume without losing state. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + + +Pause agent execution mid-task and resume from where it left off: + +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation.state import AgentExecutionStatus +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) + + +print("Simple pause example - Press Ctrl+C to pause") + +# Send a message to get the conversation started +conversation.send_message("repeatedly say hello world and don't stop") + +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() + +try: + # Main loop - similar to the user's sample script + while ( + conversation.state.agent_status != AgentExecutionStatus.FINISHED + and conversation.state.agent_status != AgentExecutionStatus.PAUSED + ): + # Send encouraging messages periodically + conversation.send_message("keep going! you can do it!") + time.sleep(1) +except KeyboardInterrupt: + conversation.pause() + +thread.join() + +print(f"Agent status: {conversation.state.agent_status}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/09_pause_example.py +``` + +### Pausing Execution + +Pause the agent from another thread or after a delay: + +```python highlight={4-6,9} +import threading +import time + +def pause_after_delay(conversation, seconds): + time.sleep(seconds) + conversation.pause() + +thread = threading.Thread(target=pause_after_delay, args=(conversation, 5)) +thread.start() +conversation.run() # Will pause after 5 seconds +``` + +### Resuming Execution + +Resume the paused conversation after performing operations: + +```python highlight={4-5} +# Agent is paused, perform operations +process_results() + +conversation.resume() +conversation.run() # Continues from where it paused +``` + +## Next Steps + +- **[Persistence](/sdk/guides/persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/send-message-while-processing)** - Interrupt running agents diff --git a/sdk/guides/persistence.mdx b/sdk/guides/persistence.mdx new file mode 100644 index 00000000..772f9cd3 --- /dev/null +++ b/sdk/guides/persistence.mdx @@ -0,0 +1,158 @@ +--- +title: Persistence +description: Save and restore conversation state for multi-session workflows. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + + +Save conversation state to disk and restore it later for long-running or multi-session workflows: + +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool(name="BashTool"), + Tool(name="FileEditorTool"), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/10_persistence.py +``` + +### Saving State + +Create a conversation with a unique ID to enable persistence: + +```python highlight={3-6} +import uuid + +conversation_id = str(uuid.uuid4()) +conversation = Conversation( + agent=agent, + conversation_id=conversation_id +) + +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` + +### Restoring State + +Restore a conversation using the same ID: + +```python highlight={3-4} +# Later, in a different session +restored = Conversation( + agent=agent, + conversation_id=conversation_id # Same ID as before +) + +restored.send_message("Continue task") +restored.run() # Continues from saved state +``` + +State includes message history, events, agent state, and tool outputs. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/async)** - Non-blocking operations diff --git a/sdk/guides/planning-agent-workflow.mdx b/sdk/guides/planning-agent-workflow.mdx new file mode 100644 index 00000000..6f3fa09e --- /dev/null +++ b/sdk/guides/planning-agent-workflow.mdx @@ -0,0 +1,172 @@ +--- +title: Planning Agent Workflow +description: Use planning-oriented agent with task tracking for complex multi-step workflows. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) + + +Use a planning agent that breaks tasks into subtasks and tracks progress systematically: + +```python icon="python" examples/01_standalone_sdk/24_planning_agent_workflow.py +#!/usr/bin/env python3 +""" +Planning Agent Workflow Example + +This example demonstrates a two-stage workflow: +1. Planning Agent: Analyzes the task and creates a detailed implementation plan +2. Execution Agent: Implements the plan with full editing capabilities + +The task: Create a Python web scraper that extracts article titles and URLs +from a news website, handles rate limiting, and saves results to JSON. +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.llm import content_to_str +from openhands.tools.preset.default import get_default_agent +from openhands.tools.preset.planning import get_planning_agent + + +def get_event_content(event): + """Extract content from an event.""" + if hasattr(event, "llm_message"): + return "".join(content_to_str(event.llm_message.content)) + return str(event) + + +"""Run the planning agent workflow example.""" + +# Create a temporary workspace +workspace_dir = Path(tempfile.mkdtemp()) +print(f"Working in: {workspace_dir}") + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="agent", +) + +# Task description +task = """ +Create a Python web scraper with the following requirements: +- Scrape article titles and URLs from a news website +- Handle HTTP errors gracefully with retry logic +- Save results to a JSON file with timestamp +- Use requests and BeautifulSoup for scraping + +Do NOT ask for any clarifying questions. Directly create your implementation plan. +""" + +print("=" * 80) +print("PHASE 1: PLANNING") +print("=" * 80) + +# Create Planning Agent with read-only tools +planning_agent = get_planning_agent(llm=llm) + +# Create conversation for planning +planning_conversation = Conversation( + agent=planning_agent, + workspace=str(workspace_dir), +) + +# Run planning phase +print("Planning Agent is analyzing the task and creating implementation plan...") +planning_conversation.send_message( + f"Please analyze this web scraping task and create a detailed " + f"implementation plan:\n\n{task}" +) +planning_conversation.run() + +print("\n" + "=" * 80) +print("PLANNING COMPLETE") +print("=" * 80) +print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") + +print("\n" + "=" * 80) +print("PHASE 2: EXECUTION") +print("=" * 80) + +# Create Execution Agent with full editing capabilities +execution_agent = get_default_agent(llm=llm, cli_mode=True) + +# Create conversation for execution +execution_conversation = Conversation( + agent=execution_agent, + workspace=str(workspace_dir), +) + +# Prepare execution prompt with reference to the plan file +execution_prompt = f""" +Please implement the web scraping project according to the implementation plan. + +The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md + +Please read the plan from PLAN.md and implement all components according to it. + +Create all necessary files, implement the functionality, and ensure everything +works together properly. +""" + +print("Execution Agent is implementing the plan...") +execution_conversation.send_message(execution_prompt) +execution_conversation.run() + +# Get the last message from the conversation +execution_result = execution_conversation.state.events[-1] + +print("\n" + "=" * 80) +print("EXECUTION RESULT:") +print("=" * 80) +print(get_event_content(execution_result)) + +print("\n" + "=" * 80) +print("WORKFLOW COMPLETE") +print("=" * 80) +print(f"Project files created in: {workspace_dir}") + +# List created files +print("\nCreated files:") +for file_path in workspace_dir.rglob("*"): + if file_path.is_file(): + print(f" - {file_path.relative_to(workspace_dir)}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/24_planning_agent_workflow.py +``` + +### Using the Planning Agent + +Get a pre-configured planning agent: + +```python highlight={3-4} +from openhands.tools.preset.planning import get_planning_agent + +agent = get_planning_agent(llm=llm) +conversation = Conversation(agent=agent) +conversation.send_message("Build a web scraper with tests and documentation") +conversation.run() +``` + +The planning agent systematically breaks down complex tasks into subtasks and tracks progress through multi-step workflows. + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized tools for planning +- **[Skills](/sdk/guides/activate-skill)** - Compose planning with skills diff --git a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx new file mode 100644 index 00000000..4373e11e --- /dev/null +++ b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx @@ -0,0 +1,42 @@ +--- +title: API Sandboxed Server +description: Connect to hosted API-based agent server for fully managed infrastructure. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +Connect to a hosted API-based agent server for fully managed infrastructure without running your own server. + +## How to Run + +```bash +export LLM_API_KEY="your-api-key" +export AGENT_SERVER_URL="https://api.openhands.ai" +export AGENT_SERVER_API_KEY="your-server-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation( + agent_server_url="https://api.openhands.ai", + api_key=server_api_key +) +``` + +No server management required - connect to hosted API. + +## Benefits + +- **Zero Ops** - No server management +- **Scalability** - Auto-scaling infrastructure +- **Reliability** - Managed uptime and monitoring + +## Related Documentation + +- [Agent Server Architecture](/sdk/architecture/agent-server) +- [Remote Workspace](/sdk/architecture/workspace/remote) diff --git a/sdk/guides/remote-agent-server/browser-with-docker.mdx b/sdk/guides/remote-agent-server/browser-with-docker.mdx new file mode 100644 index 00000000..5d775df2 --- /dev/null +++ b/sdk/guides/remote-agent-server/browser-with-docker.mdx @@ -0,0 +1,44 @@ +--- +title: Browser with Docker Sandboxed Server +description: Use browser tools with Docker-sandboxed agent server for web automation. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +Combine browser automation capabilities with Docker isolation for secure web interaction. + +## How to Run + +```bash +# Start server with browser support +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest-browser + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation(agent_server_url="http://localhost:8000") +conversation.send_message("Navigate to GitHub and search for OpenHands") +``` + +Browser tools run in isolated Docker container with the agent. + +## Benefits + +- **Secure Browsing** - Isolate web interactions +- **Clean Environment** - Fresh browser state for each session +- **Resource Control** - Limit browser resource usage + +## Related Documentation + +- [Browser Tool](/sdk/architecture/tools/browser) +- [Docker Workspace](/sdk/architecture/workspace/docker) diff --git a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx new file mode 100644 index 00000000..cf98a963 --- /dev/null +++ b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx @@ -0,0 +1,184 @@ +--- +title: Docker Workspace & Sandboxed Server +description: Run agents in isolated Docker containers for security and reproducibility. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +Docker workspaces provide complete isolation by running agents in containers. Use for production deployments, testing, and untrusted code execution. + +## DockerWorkspace + +Execute in isolated Docker containers with security boundaries. + +### Direct Usage + +```python +from openhands.workspace import DockerWorkspace +from openhands.sdk import Conversation + +workspace = DockerWorkspace( + working_dir="/workspace", + base_image="python:3.12" +) + +with workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Build a web server") + conversation.run() +# Container automatically cleaned up +``` + +See [`01_docker_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_docker_workspace.py) + +### When to Use + +- **Production** - Isolated execution environment +- **Testing** - Clean, reproducible environments +- **Untrusted code** - Run agent in sandbox +- **Multi-user** - Each user gets isolated container + +### Configuration Options + +```python +DockerWorkspace( + working_dir="/workspace", + base_image="ubuntu:22.04", # Build from base image + # OR + server_image="ghcr.io/all-hands-ai/agent-server:latest", # Pre-built image + host_port=None, # Auto-assign port + platform="linux/amd64" # Platform override +) +``` + +### Pre-built Images + +Use pre-built images for faster startup: + +```python +workspace = DockerWorkspace( + working_dir="/workspace", + server_image="ghcr.io/all-hands-ai/agent-server:latest" +) +``` + +No build time - container starts immediately. + +### File Transfer + +Copy files to/from container: + +```python +# Upload file +workspace.upload_file("/local/path/file.txt", "/workspace/file.txt") + +# Download file +workspace.download_file("/workspace/output.txt", "/local/path/output.txt") +``` + +## Docker Sandboxed Server + +Run agent server in Docker and connect remotely. + +### How to Run + +```bash +# Start server in Docker +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +``` + +### Client Connection + +```python +from openhands.sdk import RemoteConversation + +conversation = RemoteConversation( + agent_server_url="http://localhost:8000", + api_key=api_key +) +conversation.send_message("Your task") +conversation.run() +``` + +## Benefits + +**Security:** +- Complete isolation from host system +- Agent cannot access host files +- Agent cannot affect host processes + +**Resources:** +- Control CPU/memory limits +- Monitor container resource usage +- Kill containers if needed + +**Reproducibility:** +- Consistent environment across deployments +- Version-controlled container images +- Easy rollback to previous versions + +## Docker vs Local Workspace + +| Feature | LocalWorkspace | DockerWorkspace | +|---------|----------------|-----------------| +| **Security** | Low (host access) | High (isolated) | +| **Setup** | None | Docker required | +| **Performance** | Fast | Slight overhead | +| **Cleanup** | Manual | Automatic | +| **Best for** | Development | Production | + +## Best Practices + +### 1. Use Pre-built Images + +```python +# āœ… Good: Fast startup +server_image="ghcr.io/all-hands-ai/agent-server:latest" + +# āŒ Slow: Builds on every run +base_image="python:3.12" +``` + +### 2. Clean Up Containers + +Use context manager for automatic cleanup: + +```python +with workspace: + # Work with workspace + pass +# Container automatically removed +``` + +### 3. Resource Limits + +Set Docker resource limits: + +```bash +docker run --memory="2g" --cpus="1.5" \ + ghcr.io/all-hands-ai/runtime:latest +``` + +### 4. Volume Mounts + +Mount local directories for persistent data: + +```bash +docker run -v /local/data:/workspace/data \ + ghcr.io/all-hands-ai/runtime:latest +``` + +## Related Documentation + +- **[Browser with Docker](/sdk/guides/remote-agent-server/03-browser-with-docker)** - Browser in container +- **[Workspace Architecture](/sdk/architecture/sdk/workspace)** - Technical design +- **[Agent Server Architecture](/sdk/architecture/agent-server)** - Server details diff --git a/sdk/guides/remote-agent-server/local-agent-server.mdx b/sdk/guides/remote-agent-server/local-agent-server.mdx new file mode 100644 index 00000000..67383a43 --- /dev/null +++ b/sdk/guides/remote-agent-server/local-agent-server.mdx @@ -0,0 +1,91 @@ +--- +title: Local Agent Server & Workspaces +description: Understand workspaces and run agent server locally for client-server architecture. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +Workspaces define where agents execute commands and access files. This guide introduces workspace concepts and demonstrates the local agent server setup. + +## Workspace Types + +| Type | Security | Setup | Use Case | +|------|----------|-------|----------| +| **LocalWorkspace** | Low (host access) | None | Development | +| **DockerWorkspace** | High (isolated) | Docker | Testing, Production | +| **RemoteAPIWorkspace** | High (isolated) | Server | Multi-user, Cloud | + +## LocalWorkspace + +Execute directly on your machine - default for standalone SDK. + +### Usage + +```python +from openhands.sdk import Conversation + +# LocalWorkspace is implicit (no workspace parameter needed) +conversation = Conversation(agent=agent) +conversation.send_message("Create a Python script") +conversation.run() +``` + +Operations run in current working directory with direct host access. + +### When to Use + +- **Development** - Quick iteration and testing +- **Local files** - Direct access to local filesystem +- **Simple tasks** - No isolation needed + +### Security Considerations + +āš ļø **Warning**: Agent has full host access: +- Can modify any accessible files +- Can execute any commands +- **Not recommended for production or untrusted code** + +## Remote Agent Server + +Run agent server and connect remotely for resource isolation and scalability. + +### How to Run + +```bash +# Terminal 1: Start server +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python -m openhands.agent_server + +# Terminal 2: Run client +export LLM_API_KEY="your-api-key" +uv run python examples/02_remote_agent_server/01_convo_with_local_agent_server.py +``` + +### Client Connection + +```python +from openhands.sdk import RemoteConversation + +conversation = RemoteConversation( + agent_server_url="http://localhost:8000", + api_key=api_key +) +conversation.send_message("Your task") +conversation.run() +``` + +### Benefits + +- **Resource Isolation** - Server handles compute-intensive tasks +- **Scalability** - Multiple clients connect to same server +- **Deployment** - Separate client and execution environments +- **Security** - Isolate agent execution from client + +## Related Documentation + +- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/02-docker-sandboxed-server)** - Isolated execution +- **[Agent Server Architecture](/sdk/architecture/agent-server)** - Server details +- **[Workspace Architecture](/sdk/architecture/sdk/workspace)** - Technical design diff --git a/sdk/guides/remote-agent-server/vscode-with-docker.mdx b/sdk/guides/remote-agent-server/vscode-with-docker.mdx new file mode 100644 index 00000000..13be6c57 --- /dev/null +++ b/sdk/guides/remote-agent-server/vscode-with-docker.mdx @@ -0,0 +1,44 @@ +--- +title: VS Code with Docker Sandboxed Server +description: Enable VS Code integration for code editing with Docker-sandboxed agent. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) + + +Use VS Code tools with Docker-sandboxed agent server for code editing and development workflows. + +## How to Run + +```bash +# Start server with VS Code support +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest-vscode + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation(agent_server_url="http://localhost:8000") +conversation.send_message("Create a Python Flask app with routes") +``` + +Agent uses VS Code tools for editing, navigation, and refactoring in isolated environment. + +## Benefits + +- **Rich Code Editing** - VS Code features in agent workflows +- **Isolated Development** - Safe code changes in container +- **Full IDE Features** - Syntax highlighting, auto-complete, etc. + +## Related Documentation + +- [VS Code Tool](/sdk/architecture/tools/vscode) +- [Docker Workspace](/sdk/architecture/workspace/docker) diff --git a/sdk/guides/security-analyzer.mdx b/sdk/guides/security-analyzer.mdx new file mode 100644 index 00000000..5e69d104 --- /dev/null +++ b/sdk/guides/security-analyzer.mdx @@ -0,0 +1,174 @@ +--- +title: LLM Security Analyzer +description: Analyze actions for security risks before execution using LLM-based analysis. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + + +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_blocked_actions(pending_actions) -> None: + print(f"\nšŸ”’ Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\nāŒ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("āœ… Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("āŒ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set agent_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if ( + conversation.state.agent_status + == AgentExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "āš ļø Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("ā–¶ļø Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), +] + +# Agent with security analyzer +security_analyzer = LLMSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) + +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_confirmation_policy(ConfirmRisky()) + +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/16_llm_security_analyzer.py +``` + +### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python highlight={3-4} +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer + +security_llm = LLM(model="gpt-4o-mini", api_key=SecretStr(api_key)) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) + +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policies +- Uses a separate LLM to avoid conflicts with the main agent + +## Next Steps + +- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Add manual approval for actions +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools diff --git a/sdk/guides/send-message-while-processing.mdx b/sdk/guides/send-message-while-processing.mdx new file mode 100644 index 00000000..949cb15f --- /dev/null +++ b/sdk/guides/send-message-while-processing.mdx @@ -0,0 +1,184 @@ +--- +title: Send Message While Processing +description: Interrupt running agents to provide additional context or corrections. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) + + +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: + +```python icon="python" examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. + +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. + +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document + +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) + +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. + +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing + +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Send Message While Processing Example ===") + +# Step 1: Send initial message +start_time = timestamp() +conversation.send_message( + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +) + +# Step 2: Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() + +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() + + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") + + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") + + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/18_send_message_while_processing.py +``` + +### Sending Messages During Execution + +Use threading to send messages while the agent is running: + +```python highlight={4-5,7-8} +import threading + +def send_correction(): + time.sleep(3) + conversation.send_message("Actually, use Python 3.11 instead") + +thread = threading.Thread(target=send_correction) +thread.start() +conversation.run() +``` + +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow +- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Stream events in real-time diff --git a/sdk/guides/stuck-detector.mdx b/sdk/guides/stuck-detector.mdx new file mode 100644 index 00000000..38ea0edb --- /dev/null +++ b/sdk/guides/stuck-detector.mdx @@ -0,0 +1,101 @@ +--- +title: Stuck Detector +description: Detect and handle stuck agents automatically with timeout mechanisms. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + + +Automatically detect when an agent is stuck in loops or not making progress, and handle gracefully with timeouts: + +```python icon="python" examples/01_standalone_sdk/20_stuck_detector.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) + +llm_messages = [] + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) + +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) + +# Run the conversation - stuck detection happens automatically +conversation.run() + +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/20_stuck_detector.py +``` + +### Configuring Stuck Detection + +Set limits on iterations and execution time: + +```python highlight={3-4} +from openhands.sdk import StuckDetector + +stuck_detector = StuckDetector(max_iterations=50, timeout_seconds=300) +agent = Agent(llm=llm, tools=tools, stuck_detector=stuck_detector) +``` + +The agent automatically stops after reaching max iterations or timeout, preventing infinite loops and runaway executions. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Manual execution control +- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Approve actions before execution From 41be81b948a05c678727a2505565fb9ee44910d2 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 14:18:55 -0400 Subject: [PATCH 02/58] remove agent-server api --- docs.json | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs.json b/docs.json index 2355244e..5cce5164 100644 --- a/docs.json +++ b/docs.json @@ -270,10 +270,6 @@ { "tab": "OpenHands (Core) API", "openapi": "openapi/openapi.json" - }, - { - "tab": "Agent SDK (API)", - "openapi": "openapi/agent-sdk.json" } ], "global": { From c91e3682fa9502c5b4edba969bf930a2dd0788be Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 16:36:25 -0400 Subject: [PATCH 03/58] done with llm and condenser --- docs.json | 23 ++- sdk/guides/activate-skill.mdx | 8 +- sdk/guides/context-condenser.mdx | 65 +++++++- sdk/guides/llm-metrics.mdx | 41 +++-- sdk/guides/llm-reasoning.mdx | 257 ++++++++++++++++++++++++++++++ sdk/guides/llm-registry.mdx | 32 ++-- sdk/guides/llm-routing.mdx | 36 +++-- sdk/guides/model-reasoning.mdx | 261 ------------------------------- 8 files changed, 386 insertions(+), 337 deletions(-) create mode 100644 sdk/guides/llm-reasoning.mdx delete mode 100644 sdk/guides/model-reasoning.mdx diff --git a/docs.json b/docs.json index 5cce5164..069872e5 100644 --- a/docs.json +++ b/docs.json @@ -181,20 +181,18 @@ { "group": "Guides", "pages": [ + "sdk/guides/hello-world", + "sdk/guides/custom-tools", + "sdk/guides/mcp", + "sdk/guides/activate-skill", + "sdk/guides/context-condenser", { - "group": "Getting Started", - "pages": [ - "sdk/guides/hello-world", - "sdk/guides/custom-tools", - "sdk/guides/mcp" - ] - }, - { - "group": "Agent Configuration", + "group": "LLM Configuration", "pages": [ "sdk/guides/llm-registry", "sdk/guides/llm-routing", - "sdk/guides/model-reasoning" + "sdk/guides/llm-reasoning", + "sdk/guides/llm-metrics" ] }, { @@ -204,15 +202,12 @@ "sdk/guides/pause-and-resume", "sdk/guides/confirmation-mode", "sdk/guides/send-message-while-processing", - "sdk/guides/conversation-costs", - "sdk/guides/llm-metrics", - "sdk/guides/context-condenser" + "sdk/guides/conversation-costs" ] }, { "group": "Agent Capabilities", "pages": [ - "sdk/guides/activate-skill", "sdk/guides/async", "sdk/guides/planning-agent-workflow", "sdk/guides/browser-use", diff --git a/sdk/guides/activate-skill.mdx b/sdk/guides/activate-skill.mdx index ca973de0..591569a0 100644 --- a/sdk/guides/activate-skill.mdx +++ b/sdk/guides/activate-skill.mdx @@ -1,5 +1,5 @@ --- -title: Skills +title: Agent Skills description: Skills add specialized behaviors, domain knowledge, and context-aware triggers to your agent through structured prompts. --- @@ -128,7 +128,7 @@ uv run python examples/01_standalone_sdk/03_activate_skill.py Skills are defined with a name, content (the instructions), and an optional trigger: -```python highlight={2-5,8-10} +```python highlight={3-14} agent_context = AgentContext( skills=[ Skill( @@ -151,7 +151,7 @@ agent_context = AgentContext( Use `KeywordTrigger` to activate skills only when specific words appear: -```python highlight={3} +```python highlight={4} Skill( name="magic-word", content="Special instructions when magic word is detected", @@ -163,7 +163,7 @@ Skill( Add consistent prefixes or suffixes to system and user messages: -```python highlight={2-3} +```python highlight={3-4} agent_context = AgentContext( skills=[...], system_message_suffix="Always finish your response with the word 'yay!'", diff --git a/sdk/guides/context-condenser.mdx b/sdk/guides/context-condenser.mdx index 69cb621d..b9db6bca 100644 --- a/sdk/guides/context-condenser.mdx +++ b/sdk/guides/context-condenser.mdx @@ -3,13 +3,59 @@ title: Context Condenser description: Manage agent memory by condensing conversation history to save tokens. --- +## What is a Context Condenser? + +A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: + +- **šŸ’° Increased API Costs**: More tokens in the context means higher costs per API call +- **ā±ļø Slower Response Times**: Larger contexts take longer to process +- **šŸ“‰ Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information + +The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. + +## Default Implementation: LLMSummarizingCondenser + +OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. + +### How It Works + +When conversation history exceeds a defined threshold, the LLM-based condenser: + +1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context +2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained +3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries +4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction + +![Condenser Overview](https://openhands.dev/assets/blog/20250409-oh-condenser-release/condenser-overview.png) + + +This approach achieves remarkable efficiency gains: +- Up to **2x reduction** in per-turn API costs +- **Consistent response times** even in long sessions +- **Equivalent or better performance** on software engineering tasks + +Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +### Extensibility + +The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): + +- **`RollingCondenser`** - For condensers that apply condensation to rolling history +- **`CondenserBase`** - For more specialized condensation strategies + +This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. + + +### Example Usage + This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: -```python icon="python" examples/01_standalone_sdk/14_context_condenser.py +```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py """ To manage context in long-running conversations, the agent can use a context condenser that keeps the conversation history within a specified size limit. This example @@ -158,17 +204,24 @@ uv run python examples/01_standalone_sdk/14_context_condenser.py ### Setting Up Condensing -Configure a condenser when creating the agent: +Create a `LLMSummarizingCondenser` to manage the context. +The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. + +This condenser triggers when there are more than `max_context_length` events in +the conversation history, and always keeps the first `keep_first` events (system prompts, +initial user messages) to preserve important context. ```python highlight={3-4} -from openhands.sdk.context import LLMCondenser +from openhands.sdk.context import LLMSummarizingCondenser -condenser = LLMCondenser(llm=llm, max_context_length=100000) +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser agent = Agent(llm=llm, tools=tools, condenser=condenser) ``` -When context exceeds `max_context_length`, the condenser summarizes older messages to reduce token usage while maintaining important information. - ## Next Steps - **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage reduction diff --git a/sdk/guides/llm-metrics.mdx b/sdk/guides/llm-metrics.mdx index 3bc555c2..21f72bd5 100644 --- a/sdk/guides/llm-metrics.mdx +++ b/sdk/guides/llm-metrics.mdx @@ -1,5 +1,5 @@ --- -title: LLM Metrics +title: Metrics Tracking description: Track token usage, costs, and performance metrics for your agents. --- @@ -99,31 +99,30 @@ uv run python examples/01_standalone_sdk/13_get_llm_metrics.py ### Getting Metrics -Access metrics after running the conversation: +Access metrics directly from the LLM object after running the conversation: -```python highlight={3-6} +```python highlight={3-4} conversation.run() -metrics = conversation.get_llm_metrics() -print(f"Input tokens: {metrics.input_tokens}") -print(f"Output tokens: {metrics.output_tokens}") -print(f"Total cost: ${metrics.cost:.4f}") -``` - -### Tracking Changes Over Time - -Compare metrics between operations: - -```python highlight={1,4} -initial_metrics = conversation.get_llm_metrics() -conversation.send_message("more work") -conversation.run() -final_metrics = conversation.get_llm_metrics() - -tokens_used = final_metrics.input_tokens - initial_metrics.input_tokens +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") ``` -Metrics include: `input_tokens`, `output_tokens`, `cost`, `api_calls`, and `cache_reads` (if supported). +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: + +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call + +For more details on the available metrics and methods, refer to the [source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). ## Next Steps diff --git a/sdk/guides/llm-reasoning.mdx b/sdk/guides/llm-reasoning.mdx new file mode 100644 index 00000000..e7539ec7 --- /dev/null +++ b/sdk/guides/llm-reasoning.mdx @@ -0,0 +1,257 @@ +--- +title: Reasoning +description: Access model reasoning traces from Anthropic extended thinking and OpenAI responses API. +--- + +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. This guide demonstrates two provider-specific approaches: + +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter + +## Anthropic Extended Thinking + + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + + +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. + +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool + + +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Setup agent with bash tool +register_tool("BashTool", BashTool) +agent = Agent(llm=llm, tools=[Tool(name="BashTool")]) + + +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") + + +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) + +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() + +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("āœ… Done!") +``` + +```bash Running the Example +export LLM_API_KEY="your-anthropic-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +cd agent-sdk +uv run python examples/01_standalone_sdk/22_anthropic_thinking.py +``` + +### How It Works + +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: + +```python highlight={6-11} +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") + +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` + +### Understanding Thinking Blocks + +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: + +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** (([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction))): Contains redacted or summarized thinking data + +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, giving you insight into how Claude is approaching the problem. + +## OpenAI Reasoning via Responses API + + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + + +OpenAI's latest models (e.g., GPT-5, GPT-5-Codex) support a [Responses API](Responses) that provides access to the model's reasoning process. By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. + +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation + +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" + +from __future__ import annotations + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + +model = os.getenv("LLM_MODEL", "openhands/gpt-5-codex") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", +) + +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) + +llm_messages = [] # collect raw LLM-convertible messages for inspection + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) + +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() + +conversation.send_message("Now delete FACTS.txt.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") +``` + +```bash Running the Example +export LLM_API_KEY="your-openai-api-key" +export LLM_MODEL="openhands/gpt-5-codex" +cd agent-sdk +uv run python examples/01_standalone_sdk/23_responses_reasoning.py +``` + +### How It Works + +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: + +```python highlight={5} +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + reasoning_effort="high", # Enable reasoning with effort level +) +``` + +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of reasoning performed by the model. + +Then capture reasoning traces in your callback: + +```python highlight={3-4} +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` + +### Understanding Reasoning Traces + +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + +## Use Cases + +**Debugging**: Understand why the agent made specific decisions or took certain actions. + +**Transparency**: Show users how the AI arrived at its conclusions. + +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. + +**Learning**: Study how models approach complex problems. + +## Next Steps + +- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities diff --git a/sdk/guides/llm-registry.mdx b/sdk/guides/llm-registry.mdx index 80bba8fd..424ed1ea 100644 --- a/sdk/guides/llm-registry.mdx +++ b/sdk/guides/llm-registry.mdx @@ -110,30 +110,24 @@ uv run python examples/01_standalone_sdk/05_use_llm_registry.py ### Using the Registry -Get pre-configured LLMs from the registry by model name: +Create LLM registry and add the LLM, -```python highlight={3-6} -from openhands.sdk import get_llm_from_registry - -llm = get_llm_from_registry( - model="openhands/claude-sonnet-4-5-20250929", - api_key=SecretStr(api_key) +```python +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), ) -``` -### Multi-Model Applications - -Use different models for different task complexities: - -```python highlight={1-2,5,8} -cheap_llm = get_llm_from_registry("openhands/gpt-4o-mini", api_key=api_key) -powerful_llm = get_llm_from_registry("openhands/claude-sonnet-4-5-20250929", api_key=api_key) +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +``` -# Use cheap model for simple tasks -simple_agent = Agent(llm=cheap_llm, tools=tools) +Then retrieve the LLM by its usage ID, -# Use powerful model for complex tasks -complex_agent = Agent(llm=powerful_llm, tools=tools) +```python +llm = llm_registry.get("agent") ``` ## Next Steps diff --git a/sdk/guides/llm-routing.mdx b/sdk/guides/llm-routing.mdx index 8bbe0332..16649036 100644 --- a/sdk/guides/llm-routing.mdx +++ b/sdk/guides/llm-routing.mdx @@ -1,8 +1,10 @@ --- -title: LLM Routing -description: Route requests to different LLMs based on task requirements and complexity. +title: Model Routing +description: Route agent's LLM requests to different models. --- +This feature is under active development and more default routers will be available in future releases. + This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) @@ -114,21 +116,31 @@ cd agent-sdk uv run python examples/01_standalone_sdk/19_llm_routing.py ``` -### Setting Up Routing - -Configure routing rules to select models based on task complexity: +### Using the built-in MultimodalRouter -```python highlight={3-6} -from openhands.sdk import LLMRouter +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: -router = LLMRouter( - default_llm=cheap_llm, - routing_rules={"complex": powerful_llm, "simple": cheap_llm} +```python +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, ) -agent = Agent(llm=router, tools=tools) ``` -The router automatically selects the appropriate model based on task characteristics, optimizing both cost and performance. +You may define your own router by extending the `Router` class. See the [base class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. + ## Next Steps diff --git a/sdk/guides/model-reasoning.mdx b/sdk/guides/model-reasoning.mdx deleted file mode 100644 index 9b1fb2cd..00000000 --- a/sdk/guides/model-reasoning.mdx +++ /dev/null @@ -1,261 +0,0 @@ ---- -title: Model Reasoning -description: Access model reasoning traces from Anthropic thinking blocks and OpenAI responses API. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/22_model_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py) - - -View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. This example demonstrates two approaches: - -1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning -2. **OpenAI Responses Reasoning** - GPT's reasoning effort parameter - -```python icon="python" expandable examples/01_standalone_sdk/22_model_reasoning.py -""" -Example: Model Reasoning - Anthropic Thinking & OpenAI Responses - -Demonstrates two approaches to accessing model reasoning: -1. Anthropic's extended thinking feature with thinking blocks -2. OpenAI's Responses API with reasoning effort parameter - -Both approaches allow you to see the model's internal reasoning process -for transparency, debugging, and understanding decision-making. -""" - -from __future__ import annotations - -import os - -from pydantic import SecretStr - -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - RedactedThinkingBlock, - ThinkingBlock, - get_logger, -) -from openhands.sdk.tool import Tool, register_tool -from openhands.tools.execute_bash import BashTool -from openhands.tools.preset.default import get_default_agent - - -logger = get_logger(__name__) - - -def example_anthropic_thinking(): - """Demonstrate Anthropic Claude's extended thinking with thinking blocks.""" - print("\n" + "=" * 80) - print("EXAMPLE 1: Anthropic Extended Thinking") - print("=" * 80) - - api_key = os.getenv("LLM_API_KEY") - assert api_key is not None, "LLM_API_KEY environment variable is not set." - model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") - base_url = os.getenv("LLM_BASE_URL") - - llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - ) - - # Setup agent with bash tool - register_tool("BashTool", BashTool) - agent = Agent(llm=llm, tools=[Tool(name="BashTool")]) - - # Callback to display thinking blocks - def show_thinking(event: Event): - if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") - for i, block in enumerate(message.thinking_blocks): - if isinstance(block, RedactedThinkingBlock): - print(f" Block {i + 1}: {block.data}") - elif isinstance(block, ThinkingBlock): - preview = block.thinking[:100] - print(f" Block {i + 1}: {preview}...") - - conversation = Conversation( - agent=agent, callbacks=[show_thinking], workspace=os.getcwd() - ) - - conversation.send_message( - "Calculate compound interest for $10,000 at 5% annually, " - "compounded quarterly for 3 years. Show your work.", - ) - conversation.run() - - conversation.send_message( - "Now, write that number to ANTHROPIC_RESULT.txt.", - ) - conversation.run() - print("āœ… Anthropic thinking example complete!") - - -def example_openai_responses(): - """Demonstrate OpenAI's Responses API with reasoning effort.""" - print("\n" + "=" * 80) - print("EXAMPLE 2: OpenAI Responses Reasoning") - print("=" * 80) - - api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") - assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." - - model = os.getenv("LLM_MODEL", "openhands/gpt-5-codex") - base_url = os.getenv("LLM_BASE_URL") - - llm = LLM( - model=model, - api_key=SecretStr(api_key), - base_url=base_url, - # Responses-path options - reasoning_effort="high", - # Logging / behavior tweaks - log_completions=False, - usage_id="agent", - ) - - agent = get_default_agent( - llm=llm, - cli_mode=True, # disable browser tools for env simplicity - ) - - llm_messages = [] # collect raw LLM-convertible messages for inspection - - def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - msg = event.to_llm_message() - llm_messages.append(msg) - # Show reasoning if available - if hasattr(msg, "reasoning") and msg.reasoning: - preview = str(msg.reasoning)[:100] - print(f"šŸ’­ Reasoning detected: {preview}...") - - conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=os.getcwd(), - ) - - # Keep the tasks short for demo purposes - conversation.send_message("Create a file called OPENAI_RESULT.txt with a fun fact.") - conversation.run() - - conversation.send_message("Now delete OPENAI_RESULT.txt.") - conversation.run() - - print("=" * 80) - print(f"āœ… Collected {len(llm_messages)} LLM messages with reasoning traces") - print("āœ… OpenAI responses example complete!") - - -if __name__ == "__main__": - # Detect which model is being used and run appropriate example - model = os.getenv("LLM_MODEL", "") - - if "claude" in model.lower() or "anthropic" in model.lower(): - print("šŸ” Detected Anthropic model - running thinking blocks example") - example_anthropic_thinking() - elif "gpt" in model.lower() or "openai" in model.lower(): - print("šŸ” Detected OpenAI model - running responses reasoning example") - example_openai_responses() - else: - print("āš ļø Model not specified or unclear. Running both examples...") - print(" Set LLM_MODEL to 'claude-...' or 'gpt-...' to run specific example") - try: - example_anthropic_thinking() - except Exception as e: - print(f"āš ļø Anthropic example failed: {e}") - - try: - example_openai_responses() - except Exception as e: - print(f"āš ļø OpenAI example failed: {e}") - - print("\n" + "=" * 80) - print("āœ… All reasoning examples complete!") - print("=" * 80) -``` - -```bash Running the Example -# For Anthropic Claude -export LLM_API_KEY="your-anthropic-api-key" -export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" -cd agent-sdk -uv run python examples/01_standalone_sdk/22_model_reasoning.py - -# For OpenAI GPT -export LLM_API_KEY="your-openai-api-key" -export LLM_MODEL="openhands/gpt-5-codex" -cd agent-sdk -uv run python examples/01_standalone_sdk/22_model_reasoning.py -``` - -## Anthropic Thinking Blocks - -Access Claude's internal thinking process with thinking blocks: - -```python highlight={7-12} -def show_thinking(event: Event): - if isinstance(event, LLMConvertibleEvent): - message = event.to_llm_message() - if hasattr(message, "thinking_blocks") and message.thinking_blocks: - print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") - for block in message.thinking_blocks: - if isinstance(block, RedactedThinkingBlock): - print(f"Redacted: {block.data}") - elif isinstance(block, ThinkingBlock): - print(f"Thinking: {block.thinking}") - -conversation = Conversation(agent=agent, callbacks=[show_thinking]) -``` - -Claude uses thinking blocks to reason through complex problems step-by-step, improving accuracy on difficult tasks. - -## OpenAI Responses Reasoning - -Access GPT's reasoning traces with the responses API: - -```python highlight={6} -llm = LLM( - model="gpt-5-codex", - api_key=SecretStr(api_key), - base_url=base_url, - # Enable reasoning with effort level - reasoning_effort="high", -) -``` - -Then capture reasoning in your callback: - -```python highlight={4-6} -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - msg = event.to_llm_message() - if hasattr(msg, "reasoning") and msg.reasoning: - print(f"šŸ’­ Reasoning: {msg.reasoning}") -``` - -## Use Cases - -**Debugging**: Understand why the agent made specific decisions or took certain actions. - -**Transparency**: Show users how the AI arrived at its conclusions. - -**Quality Assurance**: Identify flawed reasoning patterns or logic errors. - -**Learning**: Study how models approach complex problems. - -## Next Steps - -- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Display reasoning in real-time -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and performance -- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities From ae0881ec18f1406dd576aef44a229ec890b2f0dc Mon Sep 17 00:00:00 2001 From: all-hands-bot Date: Tue, 21 Oct 2025 20:40:13 +0000 Subject: [PATCH 04/58] sync(openapi): agent-sdk/main cab92fc --- openapi/agent-sdk.json | 387 ++++++++++++++++++----------------------- 1 file changed, 167 insertions(+), 220 deletions(-) diff --git a/openapi/agent-sdk.json b/openapi/agent-sdk.json index e499decb..cbb419e4 100644 --- a/openapi/agent-sdk.json +++ b/openapi/agent-sdk.json @@ -1489,14 +1489,14 @@ } } }, - "/api/bash/execute_bash_command": { + "/api/bash/start_bash_command": { "post": { "tags": [ "Bash" ], "summary": "Start Bash Command", - "description": "Execute a bash command", - "operationId": "start_bash_command_api_bash_execute_bash_command_post", + "description": "Execute a bash command in the background", + "operationId": "start_bash_command_api_bash_start_bash_command_post", "requestBody": { "content": { "application/json": { @@ -1531,6 +1531,48 @@ } } }, + "/api/bash/execute_bash_command": { + "post": { + "tags": [ + "Bash" + ], + "summary": "Execute Bash Command", + "description": "Execute a bash command and wait for a result", + "operationId": "execute_bash_command_api_bash_execute_bash_command_post", + "requestBody": { + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/ExecuteBashRequest" + } + } + }, + "required": true + }, + "responses": { + "200": { + "description": "Successful Response", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/BashOutput" + } + } + } + }, + "422": { + "description": "Validation Error", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/HTTPValidationError" + } + } + } + } + } + } + }, "/api/bash/bash_events": { "delete": { "tags": [ @@ -2015,7 +2057,7 @@ "description": "Optional AgentContext to initialize the agent with specific context.", "examples": [ { - "microagents": [ + "skills": [ { "content": "When you see this message, you should reply like you are a grumpy cat forced to use the internet.", "name": "repo.md", @@ -2100,32 +2142,13 @@ }, "AgentContext-Output": { "properties": { - "microagents": { + "skills": { "items": { - "oneOf": [ - { - "$ref": "#/components/schemas/KnowledgeMicroagent" - }, - { - "$ref": "#/components/schemas/RepoMicroagent" - }, - { - "$ref": "#/components/schemas/TaskMicroagent" - } - ], - "title": "BaseMicroagent", - "discriminator": { - "propertyName": "kind", - "mapping": { - "openhands__sdk__context__microagents__microagent__KnowledgeMicroagent-Output__1": "#/components/schemas/KnowledgeMicroagent", - "openhands__sdk__context__microagents__microagent__RepoMicroagent-Output__1": "#/components/schemas/RepoMicroagent", - "openhands__sdk__context__microagents__microagent__TaskMicroagent-Output__1": "#/components/schemas/TaskMicroagent" - } - } + "$ref": "#/components/schemas/Skill" }, "type": "array", - "title": "Microagents", - "description": "List of available microagents that can extend the user's input." + "title": "Skills", + "description": "List of available skills that can extend the user's input." }, "system_message_suffix": { "anyOf": [ @@ -2154,7 +2177,7 @@ }, "type": "object", "title": "AgentContext", - "description": "Central structure for managing prompt extension.\n\nAgentContext unifies all the contextual inputs that shape how the system\nextends and interprets user prompts. It combines both static environment\ndetails and dynamic, user-activated extensions from microagents.\n\nSpecifically, it provides:\n- **Repository context / Repo Microagents**: Information about the active codebase,\n branches, and repo-specific instructions contributed by repo microagents.\n- **Runtime context**: Current execution environment (hosts, working\n directory, secrets, date, etc.).\n- **Conversation instructions**: Optional task- or channel-specific rules\n that constrain or guide the agent\u2019s behavior across the session.\n- **Knowledge Microagents**: Extensible components that can be triggered by user input\n to inject knowledge or domain-specific guidance.\n\nTogether, these elements make AgentContext the primary container responsible\nfor assembling, formatting, and injecting all prompt-relevant context into\nLLM interactions." + "description": "Central structure for managing prompt extension.\n\nAgentContext unifies all the contextual inputs that shape how the system\nextends and interprets user prompts. It combines both static environment\ndetails and dynamic, user-activated extensions from skills.\n\nSpecifically, it provides:\n- **Repository context / Repo Skills**: Information about the active codebase,\n branches, and repo-specific instructions contributed by repo skills.\n- **Runtime context**: Current execution environment (hosts, working\n directory, secrets, date, etc.).\n- **Conversation instructions**: Optional task- or channel-specific rules\n that constrain or guide the agent\u2019s behavior across the session.\n- **Knowledge Skills**: Extensible components that can be triggered by user input\n to inject knowledge or domain-specific guidance.\n\nTogether, these elements make AgentContext the primary container responsible\nfor assembling, formatting, and injecting all prompt-relevant context into\nLLM interactions." }, "AgentErrorEvent": { "properties": { @@ -3076,13 +3099,13 @@ "confirmation_policy": { "$ref": "#/components/schemas/ConfirmationPolicyBase" }, - "activated_knowledge_microagents": { + "activated_knowledge_skills": { "items": { "type": "string" }, "type": "array", - "title": "Activated Knowledge Microagents", - "description": "List of activated knowledge microagents name" + "title": "Activated Knowledge Skills", + "description": "List of activated knowledge skills name" }, "stats": { "$ref": "#/components/schemas/ConversationStats-Output", @@ -3829,62 +3852,30 @@ "description" ], "title": "InputMetadata", - "description": "Metadata for task microagent inputs." + "description": "Metadata for task skill inputs." }, - "KnowledgeMicroagent": { + "KeywordTrigger": { "properties": { - "kind": { - "type": "string", - "const": "KnowledgeMicroagent", - "title": "Kind", - "default": "KnowledgeMicroagent" - }, - "name": { - "type": "string", - "title": "Name" - }, - "content": { - "type": "string", - "title": "Content" - }, - "source": { - "anyOf": [ - { - "type": "string" - }, - { - "type": "null" - } - ], - "title": "Source", - "description": "The source path or identifier of the microagent. When it is None, it is treated as a programmatically defined microagent." - }, "type": { "type": "string", - "enum": [ - "knowledge", - "repo", - "task" - ], + "const": "keyword", "title": "Type", - "default": "knowledge" + "default": "keyword" }, - "triggers": { + "keywords": { "items": { "type": "string" }, "type": "array", - "title": "Triggers", - "description": "List of triggers for the microagent" + "title": "Keywords" } }, "type": "object", "required": [ - "name", - "content" + "keywords" ], - "title": "KnowledgeMicroagent", - "description": "Knowledge micro-agents provide specialized expertise that's triggered by keywords\nin conversations.\n\nThey help with:\n- Language best practices\n- Framework guidelines\n- Common patterns\n- Tool usage" + "title": "KeywordTrigger", + "description": "Trigger for keyword-based skills.\n\nThese skills are activated when specific keywords appear in the user's query." }, "LLM": { "properties": { @@ -4622,13 +4613,13 @@ "$ref": "#/components/schemas/Message", "description": "The exact LLM message for this message event" }, - "activated_microagents": { + "activated_skills": { "items": { "type": "string" }, "type": "array", - "title": "Activated Microagents", - "description": "List of activated microagent name" + "title": "Activated Skills", + "description": "List of activated skill name" }, "extended_content": { "items": { @@ -5074,66 +5065,6 @@ "title": "RemoteWorkspace", "description": "Remote Workspace Implementation." }, - "RepoMicroagent": { - "properties": { - "kind": { - "type": "string", - "const": "RepoMicroagent", - "title": "Kind", - "default": "RepoMicroagent" - }, - "name": { - "type": "string", - "title": "Name" - }, - "content": { - "type": "string", - "title": "Content" - }, - "source": { - "anyOf": [ - { - "type": "string" - }, - { - "type": "null" - } - ], - "title": "Source", - "description": "The source path or identifier of the microagent. When it is None, it is treated as a programmatically defined microagent." - }, - "type": { - "type": "string", - "enum": [ - "knowledge", - "repo", - "task" - ], - "title": "Type", - "default": "repo" - }, - "mcp_tools": { - "anyOf": [ - { - "additionalProperties": true, - "type": "object" - }, - { - "type": "null" - } - ], - "title": "Mcp Tools", - "description": "MCP tools configuration for the microagent. It should conform to the MCPConfig schema: https://gofastmcp.com/clients/client#configuration-format" - } - }, - "type": "object", - "required": [ - "name", - "content" - ], - "title": "RepoMicroagent", - "description": "Microagent specialized for repository-specific knowledge and guidelines.\n\nRepoMicroagents are loaded from `.openhands/microagents/repo.md` files within\nrepositories and contain private, repository-specific instructions that are\nautomatically loaded when\nworking with that repository. They are ideal for:\n - Repository-specific guidelines\n - Team practices and conventions\n - Project-specific workflows\n - Custom documentation references" - }, "ResponseLatency": { "properties": { "model": { @@ -5293,6 +5224,84 @@ "title": "SetConfirmationPolicyRequest", "description": "Payload to set confirmation policy for a conversation." }, + "Skill": { + "properties": { + "name": { + "type": "string", + "title": "Name" + }, + "content": { + "type": "string", + "title": "Content" + }, + "trigger": { + "anyOf": [ + { + "oneOf": [ + { + "$ref": "#/components/schemas/KeywordTrigger" + }, + { + "$ref": "#/components/schemas/TaskTrigger" + } + ], + "discriminator": { + "propertyName": "type", + "mapping": { + "keyword": "#/components/schemas/KeywordTrigger", + "task": "#/components/schemas/TaskTrigger" + } + } + }, + { + "type": "null" + } + ], + "title": "Trigger" + }, + "source": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "title": "Source", + "description": "The source path or identifier of the skill. When it is None, it is treated as a programmatically defined skill." + }, + "mcp_tools": { + "anyOf": [ + { + "additionalProperties": true, + "type": "object" + }, + { + "type": "null" + } + ], + "title": "Mcp Tools", + "description": "MCP tools configuration for the skill (repo skills only). It should conform to the MCPConfig schema: https://gofastmcp.com/clients/client#configuration-format" + }, + "inputs": { + "items": { + "$ref": "#/components/schemas/InputMetadata" + }, + "type": "array", + "title": "Inputs", + "description": "Input metadata for the skill (task skills only)" + } + }, + "type": "object", + "required": [ + "name", + "content", + "trigger" + ], + "title": "Skill", + "description": "A skill provides specialized knowledge or functionality.\n\nSkills use triggers to determine when they should be activated:\n- None: Always active, for repository-specific guidelines\n- KeywordTrigger: Activated when keywords appear in user messages\n- TaskTrigger: Activated for specific tasks, may require user input" + }, "StartConversationRequest": { "properties": { "agent": { @@ -5522,69 +5531,6 @@ ], "title": "TaskItem" }, - "TaskMicroagent": { - "properties": { - "kind": { - "type": "string", - "const": "TaskMicroagent", - "title": "Kind", - "default": "TaskMicroagent" - }, - "name": { - "type": "string", - "title": "Name" - }, - "content": { - "type": "string", - "title": "Content" - }, - "source": { - "anyOf": [ - { - "type": "string" - }, - { - "type": "null" - } - ], - "title": "Source", - "description": "The source path or identifier of the microagent. When it is None, it is treated as a programmatically defined microagent." - }, - "type": { - "type": "string", - "enum": [ - "knowledge", - "repo", - "task" - ], - "title": "Type", - "default": "task" - }, - "triggers": { - "items": { - "type": "string" - }, - "type": "array", - "title": "Triggers", - "description": "List of triggers for the microagent" - }, - "inputs": { - "items": { - "$ref": "#/components/schemas/InputMetadata" - }, - "type": "array", - "title": "Inputs", - "description": "Input metadata for the microagent. Only exists for task microagents" - } - }, - "type": "object", - "required": [ - "name", - "content" - ], - "title": "TaskMicroagent", - "description": "TaskMicroagent is a special type of KnowledgeMicroagent that requires user input.\n\nThese microagents are triggered by a special format: \"/{agent_name}\"\nand will prompt the user for any required inputs before proceeding." - }, "TaskTrackerAction": { "properties": { "kind": { @@ -5651,6 +5597,29 @@ "title": "TaskTrackerObservation", "description": "This data class represents the result of a task tracking operation." }, + "TaskTrigger": { + "properties": { + "type": { + "type": "string", + "const": "task", + "title": "Type", + "default": "task" + }, + "triggers": { + "items": { + "type": "string" + }, + "type": "array", + "title": "Triggers" + } + }, + "type": "object", + "required": [ + "triggers" + ], + "title": "TaskTrigger", + "description": "Trigger for task-specific skills.\n\nThese skills are activated for specific task types and can modify prompts." + }, "TextContent": { "properties": { "cache_prompt": { @@ -6173,28 +6142,6 @@ }, "title": "CondenserBase" }, - "BaseMicroagent": { - "oneOf": [ - { - "$ref": "#/components/schemas/KnowledgeMicroagent" - }, - { - "$ref": "#/components/schemas/RepoMicroagent" - }, - { - "$ref": "#/components/schemas/TaskMicroagent" - } - ], - "discriminator": { - "propertyName": "kind", - "mapping": { - "openhands__sdk__context__microagents__microagent__KnowledgeMicroagent-Input__1": "#/components/schemas/KnowledgeMicroagent", - "openhands__sdk__context__microagents__microagent__RepoMicroagent-Input__1": "#/components/schemas/RepoMicroagent", - "openhands__sdk__context__microagents__microagent__TaskMicroagent-Input__1": "#/components/schemas/TaskMicroagent" - } - }, - "title": "BaseMicroagent" - }, "BaseWorkspace": { "oneOf": [ { From bed1ef294a43049b255c6d0e5964de2ea5fb035e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 16:40:27 -0400 Subject: [PATCH 05/58] rename tab to sdk --- docs.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs.json b/docs.json index 069872e5..cfa85a12 100644 --- a/docs.json +++ b/docs.json @@ -174,7 +174,7 @@ ] }, { - "tab": "Agent SDK (v1)", + "tab": "SDK", "pages": [ "sdk/index", "sdk/getting-started", From 6fc72c71e60fca97eab59f4d30292255a665d190 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:00:05 -0400 Subject: [PATCH 06/58] merge convo cost --- docs.json | 7 +- sdk/guides/conversation-costs.mdx | 155 ----------- sdk/guides/llm-metrics.mdx | 130 ---------- sdk/guides/metrics.mdx | 417 ++++++++++++++++++++++++++++++ 4 files changed, 420 insertions(+), 289 deletions(-) delete mode 100644 sdk/guides/conversation-costs.mdx delete mode 100644 sdk/guides/llm-metrics.mdx create mode 100644 sdk/guides/metrics.mdx diff --git a/docs.json b/docs.json index cfa85a12..26770725 100644 --- a/docs.json +++ b/docs.json @@ -186,13 +186,13 @@ "sdk/guides/mcp", "sdk/guides/activate-skill", "sdk/guides/context-condenser", + "sdk/guides/metrics", { "group": "LLM Configuration", "pages": [ "sdk/guides/llm-registry", "sdk/guides/llm-routing", - "sdk/guides/llm-reasoning", - "sdk/guides/llm-metrics" + "sdk/guides/llm-reasoning" ] }, { @@ -201,8 +201,7 @@ "sdk/guides/persistence", "sdk/guides/pause-and-resume", "sdk/guides/confirmation-mode", - "sdk/guides/send-message-while-processing", - "sdk/guides/conversation-costs" + "sdk/guides/send-message-while-processing" ] }, { diff --git a/sdk/guides/conversation-costs.mdx b/sdk/guides/conversation-costs.mdx deleted file mode 100644 index 4b596138..00000000 --- a/sdk/guides/conversation-costs.mdx +++ /dev/null @@ -1,155 +0,0 @@ ---- -title: Conversation Costs -description: Analyze and optimize conversation token usage and associated costs. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) - - -Analyze token usage and costs to identify expensive operations and optimize budget: - -```python icon="python" examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py -import os - -from pydantic import SecretStr -from tabulate import tabulate - -from openhands.sdk import ( - LLM, - Agent, - Conversation, - LLMSummarizingCondenser, - Message, - TextContent, - get_logger, -) -from openhands.sdk.tool.registry import register_tool -from openhands.sdk.tool.spec import Tool -from openhands.tools.execute_bash import ( - BashTool, -) - - -logger = get_logger(__name__) - -# Configure LLM using LLMRegistry -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") - -# Create LLM instance -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) - -llm_condenser = LLM( - model=model, - base_url=base_url, - api_key=SecretStr(api_key), - usage_id="condenser", -) - -# Tools -register_tool("BashTool", BashTool) - -condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) - -cwd = os.getcwd() -agent = Agent( - llm=llm, - tools=[ - Tool( - name="BashTool", - ), - ], - condenser=condenser, -) - -conversation = Conversation(agent=agent, workspace=cwd) -conversation.send_message( - message=Message( - role="user", - content=[TextContent(text="Please echo 'Hello!'")], - ) -) -conversation.run() - - -# Demonstrate extraneous costs part of the conversation -second_llm = LLM( - usage_id="demo-secondary", - model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) -conversation.llm_registry.add(second_llm) -completion_response = second_llm.completion( - messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] -) - - -# Access total spend -spend = conversation.conversation_stats.get_combined_metrics() -print("\n=== Total Spend for Conversation ===\n") -print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") -if spend.accumulated_token_usage: - print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") - print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") - print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") - print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") - - -spend_per_usage = conversation.conversation_stats.usage_to_metrics -print("\n=== Spend Breakdown by Usage ID ===\n") -rows = [] -for usage_id, metrics in spend_per_usage.items(): - rows.append( - [ - usage_id, - f"${metrics.accumulated_cost:.6f}", - metrics.accumulated_token_usage.prompt_tokens - if metrics.accumulated_token_usage - else 0, - metrics.accumulated_token_usage.completion_tokens - if metrics.accumulated_token_usage - else 0, - ] - ) - -print( - tabulate( - rows, - headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], - tablefmt="github", - ) -) -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py -``` - -### Analyzing Costs - -Calculate total costs and per-message costs: - -```python highlight={1-4} -metrics = conversation.get_llm_metrics() -print(f"Total tokens: {metrics.input_tokens + metrics.output_tokens}") -print(f"Total cost: ${metrics.cost:.4f}") -print(f"Cost per message: ${metrics.cost / len(conversation.messages):.4f}") -``` - -Use this for budget control, cost attribution, and optimization. - -## Next Steps - -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track detailed token usage -- **[Context Condenser](/sdk/guides/context-condenser)** - Reduce token usage diff --git a/sdk/guides/llm-metrics.mdx b/sdk/guides/llm-metrics.mdx deleted file mode 100644 index 21f72bd5..00000000 --- a/sdk/guides/llm-metrics.mdx +++ /dev/null @@ -1,130 +0,0 @@ ---- -title: Metrics Tracking -description: Track token usage, costs, and performance metrics for your agents. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) - - -Track token usage, costs, and performance metrics from LLM interactions: - -```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py -import os - -from pydantic import SecretStr - -from openhands.sdk import ( - LLM, - Agent, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.sdk.tool import Tool, register_tool -from openhands.tools.execute_bash import BashTool -from openhands.tools.file_editor import FileEditorTool - - -logger = get_logger(__name__) - -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) - -cwd = os.getcwd() -register_tool("BashTool", BashTool) -register_tool("FileEditorTool", FileEditorTool) -tools = [ - Tool(name="BashTool"), - Tool(name="FileEditorTool"), -] - -# Add MCP Tools -mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} - -# Agent -agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) - -llm_messages = [] # collect raw LLM messages - - -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) - - -# Conversation -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=cwd, -) - -logger.info("Starting conversation with MCP integration...") -conversation.send_message( - "Read https://github.com/OpenHands/OpenHands and write 3 facts " - "about the project into FACTS.txt." -) -conversation.run() - -conversation.send_message("Great! Now delete that file.") -conversation.run() - -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") - -assert llm.metrics is not None -print( - f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" -) -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/01_standalone_sdk/13_get_llm_metrics.py -``` - -### Getting Metrics - -Access metrics directly from the LLM object after running the conversation: - -```python highlight={3-4} -conversation.run() - -assert llm.metrics is not None -print(f"Final LLM metrics: {llm.metrics.model_dump()}") -``` - -The `llm.metrics` object is an instance of the [Metrics class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: - -- `accumulated_cost` - Total accumulated cost across all API calls -- `accumulated_token_usage` - Aggregated token usage with fields like: - - `prompt_tokens` - Number of input tokens processed - - `completion_tokens` - Number of output tokens generated - - `cache_read_tokens` - Cache hits (if supported by the model) - - `cache_write_tokens` - Cache writes (if supported by the model) - - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) - - `context_window` - Context window size used -- `costs` - List of individual cost records per API call -- `token_usages` - List of detailed token usage records per API call -- `response_latencies` - List of response latency metrics per API call - -For more details on the available metrics and methods, refer to the [source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). - -## Next Steps - -- **[Conversation Costs](/sdk/guides/conversation-costs)** - Calculate costs per conversation -- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing diff --git a/sdk/guides/metrics.mdx b/sdk/guides/metrics.mdx new file mode 100644 index 00000000..24cb6c96 --- /dev/null +++ b/sdk/guides/metrics.mdx @@ -0,0 +1,417 @@ +--- +title: Metrics Tracking +description: Track token usage, costs, and latency metrics for your agents. +--- + +## Overview + +The OpenHands SDK provides comprehensive metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). + +## Getting Metrics from Individual LLMs + + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +Track token usage, costs, and performance metrics from LLM interactions: + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool(name="BashTool"), + Tool(name="FileEditorTool"), +] + +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/13_get_llm_metrics.py +``` + +### Accessing Individual LLM Metrics + +Access metrics directly from the LLM object after running the conversation: + +```python highlight={3-4} +conversation.run() + +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` + +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: + +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call + +For more details on the available metrics and methods, refer to the [source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + +## Using LLM Registry for Cost Tracking + + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + +The LLM Registry allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [Tool(name="BashTool")] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +completion_response = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content +if completion_response.choices and completion_response.choices[0].message: # type: ignore + content = completion_response.choices[0].message.content # type: ignore + print(f"Direct completion response: {content}") +else: + print("No response content available") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/05_use_llm_registry.py +``` + +### How the LLM Registry Works + +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: + +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID + +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. + +### Getting Aggregated Conversation Costs + + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + + +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. + +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os + +from pydantic import SecretStr +from tabulate import tabulate + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.registry import register_tool +from openhands.sdk.tool.spec import Tool +from openhands.tools.execute_bash import ( + BashTool, +) + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +register_tool("BashTool", BashTool) + +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name="BashTool", + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +``` + +### Understanding Conversation Stats + +The `conversation.conversation_stats` object provides comprehensive cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/All-Hands-AI/agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: + +#### Key Methods and Properties + +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. + +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. + +```python +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") + +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") + +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +``` + +## Next Steps + +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models +- **[Conversation Costs](/sdk/guides/conversation-costs)** - Calculate costs per conversation From c26f0a48bc4c6c1a11cd091b347f5ad2313d643e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:16:23 -0400 Subject: [PATCH 07/58] merge confirmation mode --- docs.json | 7 +- sdk/guides/confirmation-mode.mdx | 193 -------- ...-resume.mdx => convo-pause-and-resume.mdx} | 0 ...{persistence.mdx => convo-persistence.mdx} | 0 sdk/guides/security-analyzer.mdx | 174 ------- sdk/guides/security.mdx | 440 ++++++++++++++++++ 6 files changed, 443 insertions(+), 371 deletions(-) delete mode 100644 sdk/guides/confirmation-mode.mdx rename sdk/guides/{pause-and-resume.mdx => convo-pause-and-resume.mdx} (100%) rename sdk/guides/{persistence.mdx => convo-persistence.mdx} (100%) delete mode 100644 sdk/guides/security-analyzer.mdx create mode 100644 sdk/guides/security.mdx diff --git a/docs.json b/docs.json index 26770725..8867052f 100644 --- a/docs.json +++ b/docs.json @@ -186,6 +186,7 @@ "sdk/guides/mcp", "sdk/guides/activate-skill", "sdk/guides/context-condenser", + "sdk/guides/security", "sdk/guides/metrics", { "group": "LLM Configuration", @@ -198,9 +199,8 @@ { "group": "Conversation Management", "pages": [ - "sdk/guides/persistence", + "sdk/guides/convo-persistence", "sdk/guides/pause-and-resume", - "sdk/guides/confirmation-mode", "sdk/guides/send-message-while-processing" ] }, @@ -211,8 +211,7 @@ "sdk/guides/planning-agent-workflow", "sdk/guides/browser-use", "sdk/guides/image-input", - "sdk/guides/custom-secrets", - "sdk/guides/security-analyzer" + "sdk/guides/custom-secrets" ] }, { diff --git a/sdk/guides/confirmation-mode.mdx b/sdk/guides/confirmation-mode.mdx deleted file mode 100644 index fc30185a..00000000 --- a/sdk/guides/confirmation-mode.mdx +++ /dev/null @@ -1,193 +0,0 @@ ---- -title: Confirmation Mode -description: Require user approval before executing actions for safe agent operation. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) - - -Require user approval before executing agent actions for safe operation: - -```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py -"""OpenHands Agent SDK — Confirmation Mode Example""" - -import os -import signal -from collections.abc import Callable - -from pydantic import SecretStr - -from openhands.sdk import LLM, BaseConversation, Conversation -from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState -from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm -from openhands.tools.preset.default import get_default_agent - - -# Make ^C a clean exit instead of a stack trace -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) - - -def _print_action_preview(pending_actions) -> None: - print(f"\nšŸ” Agent created {len(pending_actions)} action(s) awaiting confirmation:") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") - - -def confirm_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). - """ - _print_action_preview(pending_actions) - while True: - try: - ans = ( - input("\nDo you want to execute these actions? (yes/no): ") - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\nāŒ No input received; rejecting by default.") - return False - - if ans in ("yes", "y"): - print("āœ… Approved — executing actions…") - return True - if ans in ("no", "n"): - print("āŒ Rejected — skipping actions…") - return False - print("Please enter 'yes' or 'no'.") - - -def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: - """ - Drive the conversation until FINISHED. - If WAITING_FOR_CONFIRMATION, ask the confirmer; - on reject, call reject_pending_actions(). - Preserves original error if agent waits but no actions exist. - """ - while conversation.state.agent_status != AgentExecutionStatus.FINISHED: - if ( - conversation.state.agent_status - == AgentExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "āš ļø Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected the actions") - # Let the agent produce a new step or finish - continue - - print("ā–¶ļø Running conversation.run()…") - conversation.run() - - -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) - -add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) -if add_security_analyzer: - print("Agent security analyzer added.") -agent = get_default_agent(llm=llm, add_security_analyzer=add_security_analyzer) -conversation = Conversation(agent=agent, workspace=os.getcwd()) - -# 1) Confirmation mode ON -conversation.set_confirmation_policy(AlwaysConfirm()) -print("\n1) Command that will likely create actions…") -conversation.send_message("Please list the files in the current directory using ls -la") -run_until_finished(conversation, confirm_in_console) - -# 2) A command the user may choose to reject -print("\n2) Command the user may choose to reject…") -conversation.send_message("Please create a file called 'dangerous_file.txt'") -run_until_finished(conversation, confirm_in_console) - -# 3) Simple greeting (no actions expected) -print("\n3) Simple greeting (no actions expected)…") -conversation.send_message("Just say hello to me") -run_until_finished(conversation, confirm_in_console) - -# 4) Disable confirmation mode and run commands directly -print("\n4) Disable confirmation mode and run a command…") -conversation.set_confirmation_policy(NeverConfirm()) -conversation.send_message("Please echo 'Hello from confirmation mode example!'") -conversation.run() - -conversation.send_message( - "Please delete any file that was created during this conversation." -) -conversation.run() - -print("\n=== Example Complete ===") -print("Key points:") -print( - "- conversation.run() creates actions; confirmation mode " - "sets agent_status=WAITING_FOR_CONFIRMATION" -) -print("- User confirmation is handled via a single reusable function") -print("- Rejection uses conversation.reject_pending_actions() and the loop continues") -print("- Simple responses work normally without actions") -print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/01_standalone_sdk/04_confirmation_mode_example.py -``` - -### Confirmation Policy - -Set the confirmation policy when creating the agent: - -```python highlight={4} -from openhands.sdk.security.confirmation_policy import AlwaysConfirm - -agent = Agent(llm=llm, tools=tools, - confirmation_policy=AlwaysConfirm()) -``` - -### Custom Confirmation Handler - -Implement your approval logic by checking conversation status: - -```python highlight={2-3,5} -while conversation.state.agent_status != AgentExecutionStatus.FINISHED: - if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not confirm_in_console(pending): - conversation.reject_pending_actions("User rejected") - continue - conversation.run() -``` - -### Rejecting Actions - -Provide feedback when rejecting to help the agent try a different approach: - -```python highlight={2-4} -if not user_approved: - conversation.reject_pending_actions( - "User rejected because actions seem too risky. Please try a safer approach." - ) -``` - -## Next Steps - -- **[Security Analyzer](/sdk/guides/security-analyzer)** - Automated security checks -- **[Custom Secrets](/sdk/guides/custom-secrets)** - Secure credential management diff --git a/sdk/guides/pause-and-resume.mdx b/sdk/guides/convo-pause-and-resume.mdx similarity index 100% rename from sdk/guides/pause-and-resume.mdx rename to sdk/guides/convo-pause-and-resume.mdx diff --git a/sdk/guides/persistence.mdx b/sdk/guides/convo-persistence.mdx similarity index 100% rename from sdk/guides/persistence.mdx rename to sdk/guides/convo-persistence.mdx diff --git a/sdk/guides/security-analyzer.mdx b/sdk/guides/security-analyzer.mdx deleted file mode 100644 index 5e69d104..00000000 --- a/sdk/guides/security-analyzer.mdx +++ /dev/null @@ -1,174 +0,0 @@ ---- -title: LLM Security Analyzer -description: Analyze actions for security risks before execution using LLM-based analysis. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) - - -Automatically analyze agent actions for security risks before execution: - -```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py -"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) - -This example shows how to use the LLMSecurityAnalyzer to automatically -evaluate security risks of actions before execution. -""" - -import os -import signal -from collections.abc import Callable - -from pydantic import SecretStr - -from openhands.sdk import LLM, Agent, BaseConversation, Conversation -from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState -from openhands.sdk.security.confirmation_policy import ConfirmRisky -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer -from openhands.sdk.tool import Tool, register_tool -from openhands.tools.execute_bash import BashTool -from openhands.tools.file_editor import FileEditorTool - - -# Clean ^C exit: no stack trace noise -signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) - - -def _print_blocked_actions(pending_actions) -> None: - print(f"\nšŸ”’ Security analyzer blocked {len(pending_actions)} high-risk action(s):") - for i, action in enumerate(pending_actions, start=1): - snippet = str(action.action)[:100].replace("\n", " ") - print(f" {i}. {action.tool_name}: {snippet}...") - - -def confirm_high_risk_in_console(pending_actions) -> bool: - """ - Return True to approve, False to reject. - Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. - """ - _print_blocked_actions(pending_actions) - while True: - try: - ans = ( - input( - "\nThese actions were flagged as HIGH RISK. " - "Do you want to execute them anyway? (yes/no): " - ) - .strip() - .lower() - ) - except (EOFError, KeyboardInterrupt): - print("\nāŒ No input received; rejecting by default.") - return False - - if ans in ("yes", "y"): - print("āœ… Approved — executing high-risk actions...") - return True - if ans in ("no", "n"): - print("āŒ Rejected — skipping high-risk actions...") - return False - print("Please enter 'yes' or 'no'.") - - -def run_until_finished_with_security( - conversation: BaseConversation, confirmer: Callable[[list], bool] -) -> None: - """ - Drive the conversation until FINISHED. - - If WAITING_FOR_CONFIRMATION: ask the confirmer. - * On approve: set agent_status = IDLE (keeps original example’s behavior). - * On reject: conversation.reject_pending_actions(...). - - If WAITING but no pending actions: print warning and set IDLE (matches original). - """ - while conversation.state.agent_status != AgentExecutionStatus.FINISHED: - if ( - conversation.state.agent_status - == AgentExecutionStatus.WAITING_FOR_CONFIRMATION - ): - pending = ConversationState.get_unmatched_actions(conversation.state.events) - if not pending: - raise RuntimeError( - "āš ļø Agent is waiting for confirmation but no pending actions " - "were found. This should not happen." - ) - if not confirmer(pending): - conversation.reject_pending_actions("User rejected high-risk actions") - continue - - print("ā–¶ļø Running conversation.run()...") - conversation.run() - - -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="security-analyzer", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) - -# Tools -register_tool("BashTool", BashTool) -register_tool("FileEditorTool", FileEditorTool) -tools = [ - Tool( - name="BashTool", - ), - Tool(name="FileEditorTool"), -] - -# Agent with security analyzer -security_analyzer = LLMSecurityAnalyzer() -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) - -# Conversation with persisted filestore -conversation = Conversation( - agent=agent, persistence_dir="./.conversations", workspace="." -) -conversation.set_confirmation_policy(ConfirmRisky()) - -print("\n1) Safe command (LOW risk - should execute automatically)...") -conversation.send_message("List files in the current directory") -conversation.run() - -print("\n2) Potentially risky command (may require confirmation)...") -conversation.send_message( - "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" -) -run_until_finished_with_security(conversation, confirm_high_risk_in_console) -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/01_standalone_sdk/16_llm_security_analyzer.py -``` - -### Security Analyzer Configuration - -Create an LLM-based security analyzer to review actions before execution: - -```python highlight={3-4} -from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer - -security_llm = LLM(model="gpt-4o-mini", api_key=SecretStr(api_key)) -security_analyzer = LLMSecurityAnalyzer(llm=security_llm) - -agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) -``` - -The security analyzer: -- Reviews each action before execution -- Flags potentially dangerous operations -- Can be configured with custom security policies -- Uses a separate LLM to avoid conflicts with the main agent - -## Next Steps - -- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Add manual approval for actions -- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx new file mode 100644 index 00000000..f63173fa --- /dev/null +++ b/sdk/guides/security.mdx @@ -0,0 +1,440 @@ +--- +title: Security & Action Confirmation +description: Control agent action execution through confirmation policy and security analyzer. +--- + +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. + +## Confirmation Policy + +Confirmation policy control whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + + +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + + +### Basic Confirmation Example + +Require user approval before executing agent actions: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.tools.preset.default import get_default_agent + + +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_action_preview(pending_actions) -> None: + print(f"\nšŸ” Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\nāŒ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("āœ… Approved — executing actions…") + return True + if ans in ("no", "n"): + print("āŒ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if ( + conversation.state.agent_status + == AgentExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "āš ļø Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("ā–¶ļø Running conversation.run()…") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") +agent = get_default_agent(llm=llm, add_security_analyzer=add_security_analyzer) +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) + +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) + +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) + +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() + +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() + +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets agent_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/04_confirmation_mode_example.py +``` + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python highlight={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) +``` + +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python highlight={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python highlight={2-4} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky. Please try a safer approach." + ) +``` + +--- + +## Security Analyzer + +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: + +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined + +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. + +### LLM Security Analyzer + +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. + + +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + + +#### Security Analyzer Example + +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus, ConversationState +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool, register_tool +from openhands.tools.execute_bash import BashTool +from openhands.tools.file_editor import FileEditorTool + + +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_blocked_actions(pending_actions) -> None: + print(f"\nšŸ”’ Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\nāŒ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("āœ… Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("āŒ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set agent_status = IDLE (keeps original example's behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if ( + conversation.state.agent_status + == AgentExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "āš ļø Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("ā–¶ļø Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +register_tool("BashTool", BashTool) +register_tool("FileEditorTool", FileEditorTool) +tools = [ + Tool( + name="BashTool", + ), + Tool(name="FileEditorTool"), +] + +# Agent with security analyzer +security_analyzer = LLMSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) + +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_confirmation_policy(ConfirmRisky()) + +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/16_llm_security_analyzer.py +``` + +#### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python highlight={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent + +### Custom Security Analyzer Implementation + +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. + +#### Creating a Custom Analyzer + +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: + +```python +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent + +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" + + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW + +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +For more details on the base class implementation, see the [source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/custom-secrets)** - Secure credential management From 1e64616535060bdb0b7b057e2778b1de1667a902 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 21 Oct 2025 21:16:42 +0000 Subject: [PATCH 08/58] docs: sync code blocks from agent-sdk examples Synced from agent-sdk ref: main --- sdk/guides/security.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx index f63173fa..5cdcd751 100644 --- a/sdk/guides/security.mdx +++ b/sdk/guides/security.mdx @@ -293,7 +293,7 @@ def run_until_finished_with_security( """ Drive the conversation until FINISHED. - If WAITING_FOR_CONFIRMATION: ask the confirmer. - * On approve: set agent_status = IDLE (keeps original example's behavior). + * On approve: set agent_status = IDLE (keeps original example’s behavior). * On reject: conversation.reject_pending_actions(...). - If WAITING but no pending actions: print warning and set IDLE (matches original). """ From 8461f56610c64236e272917a2933265fb799ee91 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:18:23 -0400 Subject: [PATCH 09/58] link to llm registry --- sdk/guides/metrics.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/metrics.mdx b/sdk/guides/metrics.mdx index 24cb6c96..3713dd09 100644 --- a/sdk/guides/metrics.mdx +++ b/sdk/guides/metrics.mdx @@ -138,7 +138,7 @@ For more details on the available metrics and methods, refer to the [source code This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py)
-The LLM Registry allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. ```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py import os From fc560a94c9c714a2928fafdb821acc39df8f3f89 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:46:31 -0400 Subject: [PATCH 10/58] rename and some improvements --- docs.json | 9 ++-- sdk/guides/{async.mdx => convo-async.mdx} | 2 +- sdk/guides/convo-persistence.mdx | 48 ++++++++++++++----- ...x => convo-send-message-while-running.mdx} | 37 +++++++++----- sdk/guides/{activate-skill.mdx => skill.mdx} | 0 5 files changed, 67 insertions(+), 29 deletions(-) rename sdk/guides/{async.mdx => convo-async.mdx} (99%) rename sdk/guides/{send-message-while-processing.mdx => convo-send-message-while-running.mdx} (84%) rename sdk/guides/{activate-skill.mdx => skill.mdx} (100%) diff --git a/docs.json b/docs.json index 8867052f..1581a8bc 100644 --- a/docs.json +++ b/docs.json @@ -184,12 +184,12 @@ "sdk/guides/hello-world", "sdk/guides/custom-tools", "sdk/guides/mcp", - "sdk/guides/activate-skill", + "sdk/guides/skill", "sdk/guides/context-condenser", "sdk/guides/security", "sdk/guides/metrics", { - "group": "LLM Configuration", + "group": "LLM Features", "pages": [ "sdk/guides/llm-registry", "sdk/guides/llm-routing", @@ -200,8 +200,9 @@ "group": "Conversation Management", "pages": [ "sdk/guides/convo-persistence", - "sdk/guides/pause-and-resume", - "sdk/guides/send-message-while-processing" + "sdk/guides/convo-pause-and-resume", + "sdk/guides/convo-send-message-while-running", + "sdk/guides/convo-async" ] }, { diff --git a/sdk/guides/async.mdx b/sdk/guides/convo-async.mdx similarity index 99% rename from sdk/guides/async.mdx rename to sdk/guides/convo-async.mdx index b416e30a..b6e80c96 100644 --- a/sdk/guides/async.mdx +++ b/sdk/guides/convo-async.mdx @@ -1,5 +1,5 @@ --- -title: Async Operations +title: Conversation with Async description: Use async/await for concurrent agent operations and non-blocking execution. --- diff --git a/sdk/guides/convo-persistence.mdx b/sdk/guides/convo-persistence.mdx index 772f9cd3..fd45314f 100644 --- a/sdk/guides/convo-persistence.mdx +++ b/sdk/guides/convo-persistence.mdx @@ -122,37 +122,61 @@ uv run python examples/01_standalone_sdk/10_persistence.py Create a conversation with a unique ID to enable persistence: -```python highlight={3-6} +```python highlight={3-4,10-11} import uuid -conversation_id = str(uuid.uuid4()) +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + conversation = Conversation( agent=agent, - conversation_id=conversation_id + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, ) - conversation.send_message("Start long task") conversation.run() # State automatically saved ``` ### Restoring State -Restore a conversation using the same ID: +Restore a conversation using the same ID and persistence directory: -```python highlight={3-4} +```python highlight={9-10} # Later, in a different session -restored = Conversation( +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( agent=agent, - conversation_id=conversation_id # Same ID as before + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, ) -restored.send_message("Continue task") -restored.run() # Continues from saved state +conversation.send_message("Continue task") +conversation.run() # Continues from saved state ``` -State includes message history, events, agent state, and tool outputs. +### What Gets Persisted + +The conversation state includes comprehensive information that allows seamless restoration: + +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys + +For the complete implementation details, see the [ConversationState class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. ## Next Steps -- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow - **[Async Operations](/sdk/guides/async)** - Non-blocking operations diff --git a/sdk/guides/send-message-while-processing.mdx b/sdk/guides/convo-send-message-while-running.mdx similarity index 84% rename from sdk/guides/send-message-while-processing.mdx rename to sdk/guides/convo-send-message-while-running.mdx index 949cb15f..114cbf2f 100644 --- a/sdk/guides/send-message-while-processing.mdx +++ b/sdk/guides/convo-send-message-while-running.mdx @@ -1,5 +1,5 @@ --- -title: Send Message While Processing +title: Send Message While Running description: Interrupt running agents to provide additional context or corrections. --- @@ -162,23 +162,36 @@ uv run python examples/01_standalone_sdk/18_send_message_while_processing.py ### Sending Messages During Execution -Use threading to send messages while the agent is running: +As shown in the example above, use threading to send messages while the agent is running: -```python highlight={4-5,7-8} -import threading +```python highlight={2-3,6-7,14} +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working -def send_correction(): - time.sleep(3) - conversation.send_message("Actually, use Python 3.11 instead") +second_time = timestamp() -thread = threading.Thread(target=send_correction) -thread.start() -conversation.run() +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() ``` +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion + The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. ## Next Steps -- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow -- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Stream events in real-time +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/async)** - Non-blocking operations diff --git a/sdk/guides/activate-skill.mdx b/sdk/guides/skill.mdx similarity index 100% rename from sdk/guides/activate-skill.mdx rename to sdk/guides/skill.mdx From 0375219f1dbcb5a892d5c738e245845702270a85 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:54:49 -0400 Subject: [PATCH 11/58] fix image input --- docs.json | 5 ++--- .../{image-input.mdx => llm-image-input.mdx} | 22 +++++++++++++++---- 2 files changed, 20 insertions(+), 7 deletions(-) rename sdk/guides/{image-input.mdx => llm-image-input.mdx} (81%) diff --git a/docs.json b/docs.json index 1581a8bc..7534c0b2 100644 --- a/docs.json +++ b/docs.json @@ -193,7 +193,8 @@ "pages": [ "sdk/guides/llm-registry", "sdk/guides/llm-routing", - "sdk/guides/llm-reasoning" + "sdk/guides/llm-reasoning", + "sdk/guides/llm-image-input" ] }, { @@ -208,10 +209,8 @@ { "group": "Agent Capabilities", "pages": [ - "sdk/guides/async", "sdk/guides/planning-agent-workflow", "sdk/guides/browser-use", - "sdk/guides/image-input", "sdk/guides/custom-secrets" ] }, diff --git a/sdk/guides/image-input.mdx b/sdk/guides/llm-image-input.mdx similarity index 81% rename from sdk/guides/image-input.mdx rename to sdk/guides/llm-image-input.mdx index 89dce95c..2f9166a0 100644 --- a/sdk/guides/image-input.mdx +++ b/sdk/guides/llm-image-input.mdx @@ -7,9 +7,9 @@ description: Send images to multimodal agents for vision-based tasks and analysi This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) -Send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: -```python icon="python" examples/01_standalone_sdk/17_image_input.py +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py """OpenHands Agent SDK — Image Input Example. This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds @@ -120,13 +120,27 @@ uv run python examples/01_standalone_sdk/17_image_input.py ### Sending Images +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + Pass images along with text in the message content: -```python highlight={3-5} +```python highlight={14} from openhands.sdk import ImageContent +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" conversation.send_message( - content=["Analyze this screenshot", ImageContent(path="/path/to/image.png")] + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) ) ``` From 66ab2643fcfc382eec9cf4cbf95fee935a895954 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 17:55:00 -0400 Subject: [PATCH 12/58] fix pause and resume --- sdk/guides/convo-pause-and-resume.mdx | 91 +++++++++++++++++---------- 1 file changed, 58 insertions(+), 33 deletions(-) diff --git a/sdk/guides/convo-pause-and-resume.mdx b/sdk/guides/convo-pause-and-resume.mdx index aaf243e6..97a8ca08 100644 --- a/sdk/guides/convo-pause-and-resume.mdx +++ b/sdk/guides/convo-pause-and-resume.mdx @@ -3,11 +3,12 @@ title: Pause and Resume description: Pause agent execution, perform operations, and resume without losing state. --- + This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) -Pause agent execution mid-task and resume from where it left off: +Pause agent execution mid-task by calling `conversation.pause()`: ```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py import os @@ -21,7 +22,6 @@ from openhands.sdk import ( Agent, Conversation, ) -from openhands.sdk.conversation.state import AgentExecutionStatus from openhands.sdk.tool import Tool, register_tool from openhands.tools.execute_bash import BashTool from openhands.tools.file_editor import FileEditorTool @@ -54,30 +54,57 @@ agent = Agent(llm=llm, tools=tools) conversation = Conversation(agent, workspace=os.getcwd()) -print("Simple pause example - Press Ctrl+C to pause") +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() + +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) -# Send a message to get the conversation started -conversation.send_message("repeatedly say hello world and don't stop") +print(f"Initial status: {conversation.state.agent_status}") +print() # Start the agent in a background thread thread = threading.Thread(target=conversation.run) thread.start() -try: - # Main loop - similar to the user's sample script - while ( - conversation.state.agent_status != AgentExecutionStatus.FINISHED - and conversation.state.agent_status != AgentExecutionStatus.PAUSED - ): - # Send encouraging messages periodically - conversation.send_message("keep going! you can do it!") - time.sleep(1) -except KeyboardInterrupt: - conversation.pause() +# Let the agent work for a few seconds +print("Letting agent work for 5 seconds...") +time.sleep(5) +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() + +# Wait for the thread to finish (it will stop when paused) thread.join() -print(f"Agent status: {conversation.state.agent_status}") +print(f"Agent status after pause: {conversation.state.agent_status}") +print() + + +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() + +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.agent_status}") + +# Resume execution +conversation.run() + +print(f"Final status: {conversation.state.agent_status}") ``` ```bash Running the Example @@ -90,29 +117,27 @@ uv run python examples/01_standalone_sdk/09_pause_example.py Pause the agent from another thread or after a delay: -```python highlight={4-6,9} -import threading -import time +```python highlight={11} +thread = threading.Thread(target=conversation.run) +thread.start() -def pause_after_delay(conversation, seconds): - time.sleep(seconds) - conversation.pause() +# Let the agent work for a few seconds +print("Letting agent work for 5 seconds...") +time.sleep(5) -thread = threading.Thread(target=pause_after_delay, args=(conversation, 5)) -thread.start() -conversation.run() # Will pause after 5 seconds +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() ``` -### Resuming Execution -Resume the paused conversation after performing operations: +### Resuming Execution -```python highlight={4-5} -# Agent is paused, perform operations -process_results() +Resume the paused conversation after performing operations by calling `conversation.run()` again: -conversation.resume() -conversation.run() # Continues from where it paused +```python +conversation.run() ``` ## Next Steps From 307870cfce35023e20097c04814c5b2f6ffc6e7e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:07:18 -0400 Subject: [PATCH 13/58] improve secrets --- docs.json | 4 +- .../{custom-secrets.mdx => secrets.mdx} | 37 +++++++++---------- 2 files changed, 20 insertions(+), 21 deletions(-) rename sdk/guides/{custom-secrets.mdx => secrets.mdx} (65%) diff --git a/docs.json b/docs.json index 7534c0b2..0b15df6c 100644 --- a/docs.json +++ b/docs.json @@ -188,6 +188,7 @@ "sdk/guides/context-condenser", "sdk/guides/security", "sdk/guides/metrics", + "sdk/guides/secrets", { "group": "LLM Features", "pages": [ @@ -210,8 +211,7 @@ "group": "Agent Capabilities", "pages": [ "sdk/guides/planning-agent-workflow", - "sdk/guides/browser-use", - "sdk/guides/custom-secrets" + "sdk/guides/browser-use" ] }, { diff --git a/sdk/guides/custom-secrets.mdx b/sdk/guides/secrets.mdx similarity index 65% rename from sdk/guides/custom-secrets.mdx rename to sdk/guides/secrets.mdx index d63bcf7a..55bbd8bc 100644 --- a/sdk/guides/custom-secrets.mdx +++ b/sdk/guides/secrets.mdx @@ -1,5 +1,5 @@ --- -title: Custom Secrets +title: Secrets Manager description: Provide environment variables and secrets to agent workspace securely. --- @@ -7,7 +7,7 @@ description: Provide environment variables and secrets to agent workspace secure This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) -Securely provide environment variables and secrets to your agent's workspace: +The Secrets Manager provides a secure way to handle sensitive data in your agent's workspace. It automatically detects secret references in bash commands, injects them as environment variables when needed, and masks secret values in command outputs to prevent accidental exposure. ```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py import os @@ -77,28 +77,27 @@ uv run python examples/01_standalone_sdk/12_custom_secrets.py ### Injecting Secrets -Pass environment variables to the workspace via `workspace_env`: +Use the `update_secrets()` method to add secrets to your conversation, as shown in the example above. -```python highlight={5-8} -workspace_env = { - "API_KEY": os.getenv("API_KEY"), - "DATABASE_URL": os.getenv("DATABASE_URL") -} -conversation = Conversation( - agent=agent, - workspace=workspace_path, - workspace_env=workspace_env -) -``` +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: + +```python highlight={4,11} +from openhands.sdk.conversation.secret_source import SecretSource + +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) -The agent can then access these variables in tool executions: -```bash -echo $API_KEY -curl -H "Authorization: Bearer $API_KEY" https://api.example.com +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) ``` + ## Next Steps -- **[MCP Integration](/sdk/guides/mcp)** - Connect external services with OAuth +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP - **[Security Analyzer](/sdk/guides/security-analyzer)** - Add security validation From 0a2634797ca5777a380d7e832fcc02e171c3be53 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:18:15 -0400 Subject: [PATCH 14/58] done with interactive terminal --- docs.json | 25 ++++----- ...nal.mdx => agent-interactive-terminal.mdx} | 56 +++++++------------ 2 files changed, 31 insertions(+), 50 deletions(-) rename sdk/guides/{interactive-terminal.mdx => agent-interactive-terminal.mdx} (54%) diff --git a/docs.json b/docs.json index 0b15df6c..63c11d64 100644 --- a/docs.json +++ b/docs.json @@ -199,26 +199,21 @@ ] }, { - "group": "Conversation Management", + "group": "Agent Features", "pages": [ - "sdk/guides/convo-persistence", - "sdk/guides/convo-pause-and-resume", - "sdk/guides/convo-send-message-while-running", - "sdk/guides/convo-async" - ] - }, - { - "group": "Agent Capabilities", - "pages": [ - "sdk/guides/planning-agent-workflow", - "sdk/guides/browser-use" + "sdk/guides/agent-interactive-terminal", + "sdk/guides/agent-browser-use", + "sdk/guides/agent-custom-planning-agent", + "sdk/guides/agent-stuck-detector" ] }, { - "group": "Agent Behavior", + "group": "Conversation Features", "pages": [ - "sdk/guides/stuck-detector", - "sdk/guides/interactive-terminal" + "sdk/guides/convo-persistence", + "sdk/guides/convo-pause-and-resume", + "sdk/guides/convo-send-message-while-running", + "sdk/guides/convo-async" ] }, { diff --git a/sdk/guides/interactive-terminal.mdx b/sdk/guides/agent-interactive-terminal.mdx similarity index 54% rename from sdk/guides/interactive-terminal.mdx rename to sdk/guides/agent-interactive-terminal.mdx index 34dc284b..33d47c63 100644 --- a/sdk/guides/interactive-terminal.mdx +++ b/sdk/guides/agent-interactive-terminal.mdx @@ -1,13 +1,14 @@ --- -title: Streaming & Interactive Terminal -description: Stream events in real-time to display agent progress and reasoning to users. +title: Interactive Terminal +description: Enable agents to interact with terminal applications like ipython, python REPL, and other interactive CLI tools. --- +The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. + This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) -Stream agent events in real-time to display progress and reasoning to users: ```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py import os @@ -83,44 +84,29 @@ cd agent-sdk uv run python examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py ``` -### Streaming Mode - -Process events as they occur in real-time: +## How It Works -```python highlight={4-7} -conversation = Conversation(agent=agent) -conversation.send_message("Build a web server") - -for event in conversation.stream(): - if isinstance(event, Action): - print(f"šŸ”§ Action: {event}") - elif isinstance(event, Observation): - print(f"šŸ‘ļø Result: {event}") +```python highlight={6} +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] ``` -### Single-Turn Mode -Wait for the agent to complete before continuing (see [Hello World](/sdk/guides/hello-world)): +The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. -```python highlight={3} -conversation = Conversation(agent=agent) -conversation.send_message("Create a Python script") -conversation.run() # Blocks until done -``` +In the example above, the agent should: +1. Enters Python's interactive mode by running `python3` +2. Executes Python code to get the current time +3. Exits the Python interpreter -### Displaying Reasoning - -Show the agent's thought process during execution: - -```python highlight={2-3} -for event in conversation.stream(): - if hasattr(event, 'reasoning') and event.reasoning: - print(f"šŸ’­ {event.reasoning}") - elif isinstance(event, Action): - print(f"šŸ”§ {event.tool_name}") -``` +The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-tools/openhands/tools/execute_bash/definition.py) and [terminal source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-tools/openhands/tools/execute_bash/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. ## Next Steps -- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/async)** - Non-blocking streaming +- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases From 0c2d58df8d0d598ebb323952eedbb6c80955b09e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:24:44 -0400 Subject: [PATCH 15/58] audited browser-use doc --- ...{browser-use.mdx => agent-browser-use.mdx} | 32 ++++++++++--------- 1 file changed, 17 insertions(+), 15 deletions(-) rename sdk/guides/{browser-use.mdx => agent-browser-use.mdx} (60%) diff --git a/sdk/guides/browser-use.mdx b/sdk/guides/agent-browser-use.mdx similarity index 60% rename from sdk/guides/browser-use.mdx rename to sdk/guides/agent-browser-use.mdx index 9db0f953..d5e70de7 100644 --- a/sdk/guides/browser-use.mdx +++ b/sdk/guides/agent-browser-use.mdx @@ -7,7 +7,7 @@ description: Enable web browsing and interaction capabilities for your agent. This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) -Give your agent the ability to navigate websites, click elements, fill forms, and extract content: +The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, and extracting content - all through natural language instructions. ```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py import os @@ -75,7 +75,7 @@ conversation = Conversation( ) conversation.send_message( - "Could you go to https://all-hands.dev/ blog page and summarize main " + "Could you go to https://openhands.dev/ blog page and summarize main " "points of the latest blog?" ) conversation.run() @@ -93,23 +93,25 @@ cd agent-sdk uv run python examples/01_standalone_sdk/15_browser_use.py ``` -### Browser Agent +## How It Works -Use the preset browser agent with built-in browser tools: +The example demonstrates combining multiple tools to create a capable web research agent: -```python highlight={3} -from openhands.tools.preset.browser import get_browser_agent +1. **BrowserToolSet**: Provides automated browser control for web interaction +2. **FileEditorTool**: Allows the agent to read and write files if needed +3. **BashTool**: Enables command-line operations for additional functionality -agent = get_browser_agent(llm=llm) -conversation = Conversation(agent=agent) -conversation.send_message("Search for OpenHands on GitHub and summarize the README") -``` +The agent uses these tools to: +- Navigate to specified URLs +- Interact with web page elements (clicking, scrolling, etc.) +- Extract and analyze content from web pages +- Summarize information from multiple sources + +In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. + +## Customization -The browser tool enables: -- Web navigation and page loading -- Element clicking and form filling -- Content extraction and screenshots -- Multi-page workflows +For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. This gives you fine-grained control over which browser capabilities are exposed to the agent. ## Next Steps From a98faaa562ce4494b0cd46ac44bcf17749f9dcb7 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:28:52 -0400 Subject: [PATCH 16/58] add stuck detector --- sdk/guides/agent-stuck-detector.mdx | 119 ++++++++++++++++++++++++++++ sdk/guides/stuck-detector.mdx | 101 ----------------------- 2 files changed, 119 insertions(+), 101 deletions(-) create mode 100644 sdk/guides/agent-stuck-detector.mdx delete mode 100644 sdk/guides/stuck-detector.mdx diff --git a/sdk/guides/agent-stuck-detector.mdx b/sdk/guides/agent-stuck-detector.mdx new file mode 100644 index 00000000..e1494ae3 --- /dev/null +++ b/sdk/guides/agent-stuck-detector.mdx @@ -0,0 +1,119 @@ +--- +title: Stuck Detector +description: Detect and handle stuck agents automatically with timeout mechanisms. +--- + + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + + +The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: + +1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) +2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) +3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) +4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) +5. **Context Window Errors**: Repeated context window errors that indicate memory management issues + +When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. + +For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). + + +```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) + +llm_messages = [] + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) + +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) + +# Run the conversation - stuck detection happens automatically +conversation.run() + +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/01_standalone_sdk/20_stuck_detector.py +``` + +## How It Works + +In this example, the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: + +1. The conversation proceeds normally until the agent starts repeating actions +2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck +3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action + +The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the stuck status at any point using `conversation.stuck_detector.is_stuck()`. + +## Pattern Detection + +The stuck detector compares events based on their semantic content rather than object identity. For example: +- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) +- **Observations** are compared by their observation content and tool name +- **Errors** are compared by their error messages +- **Messages** are compared by their content and source + +This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. + +## Next Steps + +- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control +- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK diff --git a/sdk/guides/stuck-detector.mdx b/sdk/guides/stuck-detector.mdx deleted file mode 100644 index 38ea0edb..00000000 --- a/sdk/guides/stuck-detector.mdx +++ /dev/null @@ -1,101 +0,0 @@ ---- -title: Stuck Detector -description: Detect and handle stuck agents automatically with timeout mechanisms. ---- - - -This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) - - -Automatically detect when an agent is stuck in loops or not making progress, and handle gracefully with timeouts: - -```python icon="python" examples/01_standalone_sdk/20_stuck_detector.py -import os - -from pydantic import SecretStr - -from openhands.sdk import ( - LLM, - Conversation, - Event, - LLMConvertibleEvent, - get_logger, -) -from openhands.tools.preset.default import get_default_agent - - -logger = get_logger(__name__) - -# Configure LLM -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." -model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") -base_url = os.getenv("LLM_BASE_URL") -llm = LLM( - usage_id="agent", - model=model, - base_url=base_url, - api_key=SecretStr(api_key), -) - -agent = get_default_agent(llm=llm) - -llm_messages = [] - - -def conversation_callback(event: Event): - if isinstance(event, LLMConvertibleEvent): - llm_messages.append(event.to_llm_message()) - - -# Create conversation with built-in stuck detection -conversation = Conversation( - agent=agent, - callbacks=[conversation_callback], - workspace=os.getcwd(), - # This is by default True, shown here for clarity of the example - stuck_detection=True, -) - -# Send a task that will be caught by stuck detection -conversation.send_message( - "Please execute 'ls' command 5 times, each in its own " - "action without any thought and then exit at the 6th step." -) - -# Run the conversation - stuck detection happens automatically -conversation.run() - -assert conversation.stuck_detector is not None -final_stuck_check = conversation.stuck_detector.is_stuck() -print(f"Final stuck status: {final_stuck_check}") - -print("=" * 100) -print("Conversation finished. Got the following LLM messages:") -for i, message in enumerate(llm_messages): - print(f"Message {i}: {str(message)[:200]}") -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/01_standalone_sdk/20_stuck_detector.py -``` - -### Configuring Stuck Detection - -Set limits on iterations and execution time: - -```python highlight={3-4} -from openhands.sdk import StuckDetector - -stuck_detector = StuckDetector(max_iterations=50, timeout_seconds=300) -agent = Agent(llm=llm, tools=tools, stuck_detector=stuck_detector) -``` - -The agent automatically stops after reaching max iterations or timeout, preventing infinite loops and runaway executions. - -## Next Steps - -- **[Pause and Resume](/sdk/guides/pause-and-resume)** - Manual execution control -- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Approve actions before execution From 0602d165b23886555c797218b91170a8d2635c2e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:38:21 -0400 Subject: [PATCH 17/58] improve custom planning agent --- ...ow.mdx => agent-custom-planning-agent.mdx} | 80 +++++++++++++++---- 1 file changed, 65 insertions(+), 15 deletions(-) rename sdk/guides/{planning-agent-workflow.mdx => agent-custom-planning-agent.mdx} (56%) diff --git a/sdk/guides/planning-agent-workflow.mdx b/sdk/guides/agent-custom-planning-agent.mdx similarity index 56% rename from sdk/guides/planning-agent-workflow.mdx rename to sdk/guides/agent-custom-planning-agent.mdx index 6f3fa09e..95197e2f 100644 --- a/sdk/guides/planning-agent-workflow.mdx +++ b/sdk/guides/agent-custom-planning-agent.mdx @@ -1,15 +1,18 @@ --- -title: Planning Agent Workflow -description: Use planning-oriented agent with task tracking for complex multi-step workflows. +title: Creating Custom Agent +description: Learn how to design specialized agents with custom tool sets --- +This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. + This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) -Use a planning agent that breaks tasks into subtasks and tracks progress systematically: -```python icon="python" examples/01_standalone_sdk/24_planning_agent_workflow.py +The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. + +```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py #!/usr/bin/env python3 """ Planning Agent Workflow Example @@ -151,22 +154,69 @@ cd agent-sdk uv run python examples/01_standalone_sdk/24_planning_agent_workflow.py ``` -### Using the Planning Agent +## Anatomy of a Custom Agent -Get a pre-configured planning agent: +The planning agent demonstrates the two key components for creating specialized agent: + +### 1. Custom Tool Selection + +Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: + +```python + +def register_planning_tools() -> None: + """Register the planning agent tools.""" + from openhands.tools.glob import GlobTool + from openhands.tools.grep import GrepTool + from openhands.tools.planning_file_editor import PlanningFileEditorTool + + register_tool("GlobTool", GlobTool) + logger.debug("Tool: GlobTool registered.") + register_tool("GrepTool", GrepTool) + logger.debug("Tool: GrepTool registered.") + register_tool("PlanningFileEditorTool", PlanningFileEditorTool) + logger.debug("Tool: PlanningFileEditorTool registered.") -```python highlight={3-4} -from openhands.tools.preset.planning import get_planning_agent -agent = get_planning_agent(llm=llm) -conversation = Conversation(agent=agent) -conversation.send_message("Build a web scraper with tests and documentation") -conversation.run() +def get_planning_tools() -> list[Tool]: + """Get the planning agent tool specifications. + + Returns: + List of tools optimized for planning and analysis tasks, including + file viewing and PLAN.md editing capabilities for advanced + code discovery and navigation. + """ + register_planning_tools() + + return [ + Tool(name="GlobTool"), + Tool(name="GrepTool"), + Tool(name="PlanningFileEditorTool"), + ] ``` -The planning agent systematically breaks down complex tasks into subtasks and tracks progress through multi-step workflows. +The planning agent uses: +- **GlobTool**: For discovering files and directories matching patterns +- **GrepTool**: For searching specific content across files +- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only + +This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. + +### 2. System Prompt Customization + +Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: +1. **Objective**: Clear goal statement +2. **Context Summary**: Relevant system components and constraints +3. **Approach Overview**: High-level strategy and rationale +4. **Implementation Steps**: Detailed step-by-step execution plan +5. **Testing and Validation**: Verification methods and success criteria + +### Complete Implementation Reference + +For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). ## Next Steps -- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized tools for planning -- **[Skills](/sdk/guides/activate-skill)** - Compose planning with skills +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case +- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management +- **[MCP Integration](/sdk/guides/mcp)** - Add MCP From d503f1c8e3e7f000070e3134d50fe65c63695f42 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:38:55 -0400 Subject: [PATCH 18/58] rename --- docs.json | 2 +- .../{agent-custom-planning-agent.mdx => agent-custom.mdx} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename sdk/guides/{agent-custom-planning-agent.mdx => agent-custom.mdx} (100%) diff --git a/docs.json b/docs.json index 63c11d64..d78eaec4 100644 --- a/docs.json +++ b/docs.json @@ -203,7 +203,7 @@ "pages": [ "sdk/guides/agent-interactive-terminal", "sdk/guides/agent-browser-use", - "sdk/guides/agent-custom-planning-agent", + "sdk/guides/agent-custom", "sdk/guides/agent-stuck-detector" ] }, diff --git a/sdk/guides/agent-custom-planning-agent.mdx b/sdk/guides/agent-custom.mdx similarity index 100% rename from sdk/guides/agent-custom-planning-agent.mdx rename to sdk/guides/agent-custom.mdx From 7d8635f1ca3022326470d80d41c3512c2b6f76ae Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:56:57 -0400 Subject: [PATCH 19/58] fix broken link --- sdk/guides/context-condenser.mdx | 3 +-- sdk/guides/convo-async.mdx | 4 ++-- sdk/guides/convo-pause-and-resume.mdx | 4 ++-- sdk/guides/convo-persistence.mdx | 2 +- sdk/guides/convo-send-message-while-running.mdx | 2 +- sdk/guides/custom-tools.mdx | 4 ++-- sdk/guides/hello-world.mdx | 4 ++-- sdk/guides/llm-image-input.mdx | 2 +- sdk/guides/llm-reasoning.mdx | 6 +++--- sdk/guides/llm-registry.mdx | 2 +- sdk/guides/llm-routing.mdx | 2 +- sdk/guides/mcp.mdx | 2 +- sdk/guides/metrics.mdx | 1 - sdk/guides/remote-agent-server/api-sandboxed-server.mdx | 4 ++-- sdk/guides/remote-agent-server/browser-with-docker.mdx | 4 ++-- sdk/guides/remote-agent-server/docker-sandboxed-server.mdx | 6 +++--- sdk/guides/remote-agent-server/local-agent-server.mdx | 6 +++--- sdk/guides/remote-agent-server/vscode-with-docker.mdx | 3 +-- sdk/guides/secrets.mdx | 2 +- sdk/guides/security.mdx | 2 +- sdk/guides/skill.mdx | 2 +- 21 files changed, 32 insertions(+), 35 deletions(-) diff --git a/sdk/guides/context-condenser.mdx b/sdk/guides/context-condenser.mdx index b9db6bca..c06a333c 100644 --- a/sdk/guides/context-condenser.mdx +++ b/sdk/guides/context-condenser.mdx @@ -224,5 +224,4 @@ agent = Agent(llm=llm, tools=tools, condenser=condenser) ## Next Steps -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage reduction -- **[Conversation Costs](/sdk/guides/conversation-costs)** - Analyze cost savings +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings diff --git a/sdk/guides/convo-async.mdx b/sdk/guides/convo-async.mdx index b6e80c96..e1021396 100644 --- a/sdk/guides/convo-async.mdx +++ b/sdk/guides/convo-async.mdx @@ -145,5 +145,5 @@ async def main(): ## Next Steps -- **[Persistence](/sdk/guides/persistence)** - Save and restore conversation state -- **[Send Message While Processing](/sdk/guides/send-message-while-processing)** - Interrupt running agents +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents diff --git a/sdk/guides/convo-pause-and-resume.mdx b/sdk/guides/convo-pause-and-resume.mdx index 97a8ca08..b9f56d89 100644 --- a/sdk/guides/convo-pause-and-resume.mdx +++ b/sdk/guides/convo-pause-and-resume.mdx @@ -142,5 +142,5 @@ conversation.run() ## Next Steps -- **[Persistence](/sdk/guides/persistence)** - Save and restore conversation state -- **[Send Message While Processing](/sdk/guides/send-message-while-processing)** - Interrupt running agents +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents diff --git a/sdk/guides/convo-persistence.mdx b/sdk/guides/convo-persistence.mdx index fd45314f..2523f17f 100644 --- a/sdk/guides/convo-persistence.mdx +++ b/sdk/guides/convo-persistence.mdx @@ -179,4 +179,4 @@ For the complete implementation details, see the [ConversationState class](https ## Next Steps - **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/async)** - Non-blocking operations +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations diff --git a/sdk/guides/convo-send-message-while-running.mdx b/sdk/guides/convo-send-message-while-running.mdx index 114cbf2f..df3ac386 100644 --- a/sdk/guides/convo-send-message-while-running.mdx +++ b/sdk/guides/convo-send-message-while-running.mdx @@ -194,4 +194,4 @@ The agent receives and incorporates the new message mid-execution, allowing for ## Next Steps - **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow -- **[Async Operations](/sdk/guides/async)** - Non-blocking operations +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations diff --git a/sdk/guides/custom-tools.mdx b/sdk/guides/custom-tools.mdx index 8440e8cf..c1455bdb 100644 --- a/sdk/guides/custom-tools.mdx +++ b/sdk/guides/custom-tools.mdx @@ -17,7 +17,7 @@ tools = get_default_tools() agent = Agent(llm=llm, tools=tools) ``` -See [Tools Overview](/sdk/architecture/tools/overview) for the complete list of available tools. +See [Tools Overview](/sdk/arch/tools/overview) for the complete list of available tools. ## Understanding the Tool System @@ -27,7 +27,7 @@ The SDK's tool system is built around three core components: 2. **Observation** - Defines output data (what the tool returns) 3. **Executor** - Implements the tool's logic (what the tool does) -These components are tied together by a **ToolDefinition** that registers the tool with the agent. For architectural details and advanced usage patterns, see [Tool System Architecture](/sdk/architecture/sdk/tool). +These components are tied together by a **ToolDefinition** that registers the tool with the agent. For architectural details and advanced usage patterns, see [Tool System Architecture](/sdk/arch/sdk/tool). ## Creating a Custom Tool diff --git a/sdk/guides/hello-world.mdx b/sdk/guides/hello-world.mdx index d33e6c11..43642b02 100644 --- a/sdk/guides/hello-world.mdx +++ b/sdk/guides/hello-world.mdx @@ -66,7 +66,7 @@ Use the preset agent with common built-in tools: agent = get_default_agent(llm=llm, cli_mode=True) ``` -The default agent includes BashTool, FileEditorTool, etc. See [Tools Overview](/sdk/architecture/tools/overview) for the complete list of available tools. +The default agent includes BashTool, FileEditorTool, etc. See [Tools Overview](/sdk/arch/tools/overview) for the complete list of available tools. ### Conversation Start a conversation to manage the agent's lifecycle: @@ -99,4 +99,4 @@ FACTS.txt - **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs - **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers -- **[Security Analyzer](/sdk/guides/security-analyzer)** - Add security validation to tool usage +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage diff --git a/sdk/guides/llm-image-input.mdx b/sdk/guides/llm-image-input.mdx index 2f9166a0..1092b24e 100644 --- a/sdk/guides/llm-image-input.mdx +++ b/sdk/guides/llm-image-input.mdx @@ -149,4 +149,4 @@ Works with multimodal LLMs like GPT-4 Vision and Claude with vision capabilities ## Next Steps - **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns -- **[Async Operations](/sdk/guides/async)** - Process multiple images concurrently +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently diff --git a/sdk/guides/llm-reasoning.mdx b/sdk/guides/llm-reasoning.mdx index e7539ec7..9d59db90 100644 --- a/sdk/guides/llm-reasoning.mdx +++ b/sdk/guides/llm-reasoning.mdx @@ -125,7 +125,7 @@ By registering a callback with your conversation, you can intercept and display This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) -OpenAI's latest models (e.g., GPT-5, GPT-5-Codex) support a [Responses API](Responses) that provides access to the model's reasoning process. By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. +OpenAI's latest models (e.g., GPT-5, GPT-5-Codex) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) that provides access to the model's reasoning process. By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. ```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py """ @@ -252,6 +252,6 @@ The OpenAI Responses API provides reasoning traces that show how the model appro ## Next Steps -- **[Interactive Terminal](/sdk/guides/interactive-terminal)** - Display reasoning in real-time -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and performance +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance - **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities diff --git a/sdk/guides/llm-registry.mdx b/sdk/guides/llm-registry.mdx index 424ed1ea..7c0dee1f 100644 --- a/sdk/guides/llm-registry.mdx +++ b/sdk/guides/llm-registry.mdx @@ -133,4 +133,4 @@ llm = llm_registry.get("agent") ## Next Steps - **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and costs +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs diff --git a/sdk/guides/llm-routing.mdx b/sdk/guides/llm-routing.mdx index 16649036..17a78ed2 100644 --- a/sdk/guides/llm-routing.mdx +++ b/sdk/guides/llm-routing.mdx @@ -145,4 +145,4 @@ You may define your own router by extending the `Router` class. See the [base cl ## Next Steps - **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations -- **[LLM Metrics](/sdk/guides/llm-metrics)** - Track token usage and costs +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs diff --git a/sdk/guides/mcp.mdx b/sdk/guides/mcp.mdx index ff9c615e..334f0e70 100644 --- a/sdk/guides/mcp.mdx +++ b/sdk/guides/mcp.mdx @@ -244,4 +244,4 @@ mcp_config = { - **[MCP Architecture](/sdk/arch/sdk/mcp)** - Technical details and internals - **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools -- **[Security Analyzer](/sdk/guides/security-analyzer)** - Securing tool usage +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage diff --git a/sdk/guides/metrics.mdx b/sdk/guides/metrics.mdx index 3713dd09..e8b73516 100644 --- a/sdk/guides/metrics.mdx +++ b/sdk/guides/metrics.mdx @@ -414,4 +414,3 @@ for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items( - **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs - **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models -- **[Conversation Costs](/sdk/guides/conversation-costs)** - Calculate costs per conversation diff --git a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx index 4373e11e..9f8bef79 100644 --- a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx +++ b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx @@ -38,5 +38,5 @@ No server management required - connect to hosted API. ## Related Documentation -- [Agent Server Architecture](/sdk/architecture/agent-server) -- [Remote Workspace](/sdk/architecture/workspace/remote) +- [Agent Server Architecture](/sdk/arch/agent_server/overview) +- [Remote Workspace](/sdk/arch/workspace/remote_api) diff --git a/sdk/guides/remote-agent-server/browser-with-docker.mdx b/sdk/guides/remote-agent-server/browser-with-docker.mdx index 5d775df2..a3230976 100644 --- a/sdk/guides/remote-agent-server/browser-with-docker.mdx +++ b/sdk/guides/remote-agent-server/browser-with-docker.mdx @@ -40,5 +40,5 @@ Browser tools run in isolated Docker container with the agent. ## Related Documentation -- [Browser Tool](/sdk/architecture/tools/browser) -- [Docker Workspace](/sdk/architecture/workspace/docker) +- [Browser Tool](/sdk/arch/tools/browser_use) +- [Docker Workspace](/sdk/arch/workspace/docker) diff --git a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx index cf98a963..8c7967e3 100644 --- a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx +++ b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx @@ -179,6 +179,6 @@ docker run -v /local/data:/workspace/data \ ## Related Documentation -- **[Browser with Docker](/sdk/guides/remote-agent-server/03-browser-with-docker)** - Browser in container -- **[Workspace Architecture](/sdk/architecture/sdk/workspace)** - Technical design -- **[Agent Server Architecture](/sdk/architecture/agent-server)** - Server details +- **[Browser with Docker](/sdk/guides/remote-agent-server/browser-with-docker)** - Browser in container +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details diff --git a/sdk/guides/remote-agent-server/local-agent-server.mdx b/sdk/guides/remote-agent-server/local-agent-server.mdx index 67383a43..c08c9c8d 100644 --- a/sdk/guides/remote-agent-server/local-agent-server.mdx +++ b/sdk/guides/remote-agent-server/local-agent-server.mdx @@ -86,6 +86,6 @@ conversation.run() ## Related Documentation -- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/02-docker-sandboxed-server)** - Isolated execution -- **[Agent Server Architecture](/sdk/architecture/agent-server)** - Server details -- **[Workspace Architecture](/sdk/architecture/sdk/workspace)** - Technical design +- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/docker-sandboxed-server)** - Isolated execution +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design diff --git a/sdk/guides/remote-agent-server/vscode-with-docker.mdx b/sdk/guides/remote-agent-server/vscode-with-docker.mdx index 13be6c57..78aa7598 100644 --- a/sdk/guides/remote-agent-server/vscode-with-docker.mdx +++ b/sdk/guides/remote-agent-server/vscode-with-docker.mdx @@ -40,5 +40,4 @@ Agent uses VS Code tools for editing, navigation, and refactoring in isolated en ## Related Documentation -- [VS Code Tool](/sdk/architecture/tools/vscode) -- [Docker Workspace](/sdk/architecture/workspace/docker) +- [Docker Workspace](/sdk/arch/workspace/docker) diff --git a/sdk/guides/secrets.mdx b/sdk/guides/secrets.mdx index 55bbd8bc..505d5a2c 100644 --- a/sdk/guides/secrets.mdx +++ b/sdk/guides/secrets.mdx @@ -100,4 +100,4 @@ conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) ## Next Steps - **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP -- **[Security Analyzer](/sdk/guides/security-analyzer)** - Add security validation +- **[Security Analyzer](/sdk/guides/security)** - Add security validation diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx index 5cdcd751..cddea4a0 100644 --- a/sdk/guides/security.mdx +++ b/sdk/guides/security.mdx @@ -437,4 +437,4 @@ For more details on the base class implementation, see the [source code](https:/ ## Next Steps - **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools -- **[Custom Secrets](/sdk/guides/custom-secrets)** - Secure credential management +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management diff --git a/sdk/guides/skill.mdx b/sdk/guides/skill.mdx index 591569a0..49e99638 100644 --- a/sdk/guides/skill.mdx +++ b/sdk/guides/skill.mdx @@ -175,4 +175,4 @@ agent_context = AgentContext( - **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools - **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers -- **[Confirmation Mode](/sdk/guides/confirmation-mode)** - Add execution approval +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval From a651e927f7fc9634f1e99d62ced16fd5ed92be80 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Tue, 21 Oct 2025 18:57:54 -0400 Subject: [PATCH 20/58] remove stuff not ready yet --- sdk/arch/agent_server/overview.mdx | 433 ---------------- sdk/arch/overview.mdx | 142 ----- sdk/arch/sdk/agent.mdx | 301 ----------- sdk/arch/sdk/condenser.mdx | 166 ------ sdk/arch/sdk/conversation.mdx | 487 ------------------ sdk/arch/sdk/event.mdx | 403 --------------- sdk/arch/sdk/llm.mdx | 416 --------------- sdk/arch/sdk/mcp.mdx | 333 ------------ sdk/arch/sdk/microagents.mdx | 225 -------- sdk/arch/sdk/security.mdx | 416 --------------- sdk/arch/sdk/tool.mdx | 199 ------- sdk/arch/sdk/workspace.mdx | 322 ------------ sdk/arch/tools/bash.mdx | 288 ----------- sdk/arch/tools/browser_use.mdx | 101 ---- sdk/arch/tools/file_editor.mdx | 338 ------------ sdk/arch/tools/glob.mdx | 89 ---- sdk/arch/tools/grep.mdx | 140 ----- sdk/arch/tools/overview.mdx | 185 ------- sdk/arch/tools/planning_file_editor.mdx | 128 ----- sdk/arch/tools/task_tracker.mdx | 146 ------ sdk/arch/workspace/docker.mdx | 330 ------------ sdk/arch/workspace/overview.mdx | 99 ---- sdk/arch/workspace/remote_api.mdx | 325 ------------ sdk/guides/github-workflows/pr-review.mdx | 65 --- .../github-workflows/routine-maintenance.mdx | 74 --- .../api-sandboxed-server.mdx | 42 -- .../browser-with-docker.mdx | 44 -- .../docker-sandboxed-server.mdx | 184 ------- .../local-agent-server.mdx | 91 ---- .../vscode-with-docker.mdx | 43 -- 30 files changed, 6555 deletions(-) delete mode 100644 sdk/arch/agent_server/overview.mdx delete mode 100644 sdk/arch/overview.mdx delete mode 100644 sdk/arch/sdk/agent.mdx delete mode 100644 sdk/arch/sdk/condenser.mdx delete mode 100644 sdk/arch/sdk/conversation.mdx delete mode 100644 sdk/arch/sdk/event.mdx delete mode 100644 sdk/arch/sdk/llm.mdx delete mode 100644 sdk/arch/sdk/mcp.mdx delete mode 100644 sdk/arch/sdk/microagents.mdx delete mode 100644 sdk/arch/sdk/security.mdx delete mode 100644 sdk/arch/sdk/tool.mdx delete mode 100644 sdk/arch/sdk/workspace.mdx delete mode 100644 sdk/arch/tools/bash.mdx delete mode 100644 sdk/arch/tools/browser_use.mdx delete mode 100644 sdk/arch/tools/file_editor.mdx delete mode 100644 sdk/arch/tools/glob.mdx delete mode 100644 sdk/arch/tools/grep.mdx delete mode 100644 sdk/arch/tools/overview.mdx delete mode 100644 sdk/arch/tools/planning_file_editor.mdx delete mode 100644 sdk/arch/tools/task_tracker.mdx delete mode 100644 sdk/arch/workspace/docker.mdx delete mode 100644 sdk/arch/workspace/overview.mdx delete mode 100644 sdk/arch/workspace/remote_api.mdx delete mode 100644 sdk/guides/github-workflows/pr-review.mdx delete mode 100644 sdk/guides/github-workflows/routine-maintenance.mdx delete mode 100644 sdk/guides/remote-agent-server/api-sandboxed-server.mdx delete mode 100644 sdk/guides/remote-agent-server/browser-with-docker.mdx delete mode 100644 sdk/guides/remote-agent-server/docker-sandboxed-server.mdx delete mode 100644 sdk/guides/remote-agent-server/local-agent-server.mdx delete mode 100644 sdk/guides/remote-agent-server/vscode-with-docker.mdx diff --git a/sdk/arch/agent_server/overview.mdx b/sdk/arch/agent_server/overview.mdx deleted file mode 100644 index 593cb646..00000000 --- a/sdk/arch/agent_server/overview.mdx +++ /dev/null @@ -1,433 +0,0 @@ ---- -title: Agent Server -description: HTTP server for remote agent execution with Docker-based sandboxing and API access. ---- - -The Agent Server provides HTTP API endpoints for remote agent execution. It enables centralized agent management, multi-user support, and production deployments. - -**Source**: [`openhands/agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server) - -## Purpose - -The Agent Server enables: -- **Remote Execution**: Run agents on dedicated servers -- **Multi-User Support**: Isolate execution per user -- **Resource Management**: Centralized resource allocation -- **API Access**: HTTP API for agent operations -- **Production Deployment**: Scalable agent infrastructure - -## Architecture - -```mermaid -graph TD - Client[Client SDK] -->|HTTPS| Server[Agent Server] - Server --> Router[FastAPI Router] - - Router --> Workspace[Workspace API] - Router --> Health[Health Check] - - Workspace --> Docker[Docker Manager] - Docker --> Container1[Container 1] - Docker --> Container2[Container 2] - - style Client fill:#e1f5fe - style Server fill:#fff3e0 - style Router fill:#e8f5e8 - style Docker fill:#f3e5f5 -``` - -## Quick Start - -### Using Pre-built Docker Image - -```bash -# Pull latest image -docker pull ghcr.io/all-hands-ai/agent-server:latest - -# Run server -docker run -d \ - -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest -``` - -### Using Python - -```bash -# Install agent-server package -pip install openhands-agent-server - -# Start server -openhands-agent-server -``` - -## Building Docker Images - -**Source**: [`openhands/agent_server/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server/docker) - -### Build Script - -```bash -# Build from source -python -m openhands.agent_server.docker.build \ - --base-image ubuntu:22.04 \ - --target runtime \ - --platform linux/amd64 -``` - -### Build Options - -| Option | Description | Default | -|--------|-------------|---------| -| `--base-image` | Base Docker image | `ubuntu:22.04` | -| `--target` | Build target (`runtime` or `dev`) | `runtime` | -| `--platform` | Target platform | Host platform | -| `--output-image` | Output image name | Auto-generated | - -### Programmatic Build - -```python -from openhands.agent_server.docker.build import ( - BuildOptions, - build -) - -# Build custom image -image_name = build( - BuildOptions( - base_image="python:3.12", - target="runtime", - platform="linux/amd64" - ) -) - -print(f"Built image: {image_name}") -``` - -## Docker Images - -### Official Images - -```bash -# Latest release -ghcr.io/all-hands-ai/agent-server:latest - -# Specific version -ghcr.io/all-hands-ai/agent-server:v1.0.0 - -# Development build -ghcr.io/all-hands-ai/agent-server:dev -``` - -### Image Variants - -- **`runtime`**: Production-ready, minimal size -- **`dev`**: Development tools included - -## API Endpoints - -### Health Check - -```bash -GET /api/health -``` - -Returns server health status. - -### Execute Command - -```bash -POST /api/workspace/command -Content-Type: application/json -Authorization: Bearer - -{ - "command": "python script.py", - "working_dir": "/workspace", - "timeout": 30.0 -} -``` - -### File Upload - -```bash -POST /api/workspace/upload -Authorization: Bearer -Content-Type: multipart/form-data - -# Form data with file -``` - -### File Download - -```bash -GET /api/workspace/download?path=/workspace/output.txt -Authorization: Bearer -``` - -## Configuration - -### Environment Variables - -```bash -# Server configuration -export HOST=0.0.0.0 -export PORT=8000 -export API_KEY=your-secret-key - -# Docker configuration -export DOCKER_HOST=unix:///var/run/docker.sock - -# Logging -export LOG_LEVEL=INFO -export DEBUG=false -``` - -### Server Settings - -```python -# config.py -class Settings: - host: str = "0.0.0.0" - port: int = 8000 - api_key: str = "your-secret-key" - workers: int = 4 - timeout: float = 300.0 -``` - -## Deployment - -### Docker Compose - -```yaml -# docker-compose.yml -version: '3.8' - -services: - agent-server: - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - "8000:8000" - volumes: - - /var/run/docker.sock:/var/run/docker.sock - environment: - - API_KEY=your-secret-key - - LOG_LEVEL=INFO - restart: unless-stopped -``` - -### Kubernetes - -```yaml -# deployment.yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: agent-server -spec: - replicas: 3 - selector: - matchLabels: - app: agent-server - template: - metadata: - labels: - app: agent-server - spec: - containers: - - name: agent-server - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - containerPort: 8000 - env: - - name: API_KEY - valueFrom: - secretKeyRef: - name: agent-server-secrets - key: api-key -``` - -### Systemd Service - -```ini -# /etc/systemd/system/agent-server.service -[Unit] -Description=OpenHands Agent Server -After=docker.service -Requires=docker.service - -[Service] -Type=simple -ExecStart=/usr/bin/docker run \ - --rm \ - -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest - -Restart=always -RestartSec=10 - -[Install] -WantedBy=multi-user.target -``` - -## Security - -### Authentication - -```python -# API key authentication -from fastapi import Header, HTTPException - -async def verify_api_key(authorization: str = Header(None)): - if not authorization or not authorization.startswith("Bearer "): - raise HTTPException(status_code=401) - - api_key = authorization.split(" ")[1] - if api_key != expected_api_key: - raise HTTPException(status_code=403) -``` - -### Container Isolation - -- Each request executes in separate Docker container -- Containers have resource limits -- Network isolation between containers -- Automatic cleanup after execution - -### Rate Limiting - -```python -# Implement rate limiting per API key -from slowapi import Limiter - -limiter = Limiter(key_func=lambda: request.headers.get("Authorization")) - -@app.post("/api/workspace/command") -@limiter.limit("100/minute") -async def execute_command(...): - ... -``` - -## Monitoring - -### Health Checks - -```bash -# Check if server is running -curl http://localhost:8000/api/health - -# Response: -# {"status": "healthy", "version": "1.0.0"} -``` - -### Logging - -```python -# Structured logging -import logging - -logger = logging.getLogger("agent_server") -logger.info("Request received", extra={ - "user_id": user_id, - "command": command, - "duration": duration -}) -``` - -### Metrics - -Track important metrics: -- Request rate and latency -- Container creation/cleanup time -- Resource usage per container -- Error rates and types - -## Troubleshooting - -### Server Won't Start - -```bash -# Check port availability -netstat -tuln | grep 8000 - -# Check Docker socket -docker ps - -# Check logs -docker logs agent-server -``` - -### Container Creation Fails - -```bash -# Verify Docker permissions -docker run hello-world - -# Check Docker socket mount -ls -la /var/run/docker.sock - -# Check available resources -docker stats -``` - -### Performance Issues - -```bash -# Check resource usage -docker stats - -# Increase worker count -export WORKERS=8 - -# Optimize container startup -# Use pre-built images -# Reduce image size -``` - -## Best Practices - -1. **Use Pre-built Images**: Faster startup, consistent environment -2. **Set Resource Limits**: Prevent resource exhaustion -3. **Enable Monitoring**: Track performance and errors -4. **Implement Rate Limiting**: Prevent abuse -5. **Secure API Keys**: Use strong, rotated keys -6. **Use HTTPS**: Encrypt data in transit -7. **Regular Updates**: Keep images updated -8. **Backup Configuration**: Version control configurations - -## Development - -### Running Locally - -```bash -# Clone repository -git clone https://github.com/All-Hands-AI/agent-sdk.git -cd agent-sdk - -# Install dependencies -pip install -e ".[server]" - -# Run development server -uvicorn openhands.agent_server.main:app --reload -``` - -### Testing - -```bash -# Run tests -pytest openhands/agent_server/tests/ - -# Test specific endpoint -curl -X POST http://localhost:8000/api/workspace/command \ - -H "Authorization: Bearer test-key" \ - -H "Content-Type: application/json" \ - -d '{"command": "echo test", "working_dir": "/workspace"}' -``` - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based local execution -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Client for agent server -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Server usage examples -- **[FastAPI Documentation](https://fastapi.tiangolo.com/)** - Web framework used diff --git a/sdk/arch/overview.mdx b/sdk/arch/overview.mdx deleted file mode 100644 index 6662ba9b..00000000 --- a/sdk/arch/overview.mdx +++ /dev/null @@ -1,142 +0,0 @@ ---- -title: Overview -description: A modular framework for building AI agents, organized into four packages for clarity and extensibility. ---- - -The OpenHands Agent SDK is organized into four packages, each serving a distinct purpose in the agent development lifecycle. - -## Package Structure - -```mermaid -graph TD - SDK[SDK Package
Core Framework] --> Tools[Tools Package
Built-in Tools] - SDK --> Workspace[Workspace Package
Execution Environments] - SDK --> AgentServer[Agent Server Package
Remote Execution] - - Tools -.->|Used by| SDK - Workspace -.->|Used by| SDK - AgentServer -.->|Hosts| SDK - - style SDK fill:#e1f5fe - style Tools fill:#e8f5e8 - style Workspace fill:#fff3e0 - style AgentServer fill:#f3e5f5 -``` - -## 1. SDK Package - -Core framework for building agents locally. - -**Key Components:** -- **[Tool System](/sdk/architecture/sdk/tool)** - Define custom capabilities -- **[Microagents](/sdk/architecture/sdk/microagents)** - Specialized behavior modules -- **[Condenser](/sdk/architecture/sdk/condenser)** - Memory management -- **[Agent](/sdk/architecture/sdk/agent)** - Base agent interface -- **[Workspace](/sdk/architecture/sdk/workspace)** - Execution abstraction -- **[Conversation](/sdk/architecture/sdk/conversation)** - Lifecycle management -- **[Event](/sdk/architecture/sdk/event)** - Event system -- **[LLM](/sdk/architecture/sdk/llm)** - Language model integration -- **[MCP](/sdk/architecture/sdk/mcp)** - Model Context Protocol -- **[Security](/sdk/architecture/sdk/security)** - Security framework - -## 2. Tools Package - -Production-ready tool implementations. - -**Available Tools:** -- **[BashTool](/sdk/architecture/tools/bash)** - Command execution -- **[FileEditorTool](/sdk/architecture/tools/file_editor)** - File manipulation -- **[GlobTool](/sdk/architecture/tools/glob)** - File discovery -- **[GrepTool](/sdk/architecture/tools/grep)** - Content search -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker)** - Task management -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor)** - Multi-file workflows -- **[BrowserUseTool](/sdk/architecture/tools/browser_use)** - Web interaction - -## 3. Workspace Package - -Advanced execution environments for production. - -**Workspace Types:** -- **[DockerWorkspace](/sdk/architecture/workspace/docker)** - Container-based isolation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api)** - Remote server execution - -See [Workspace Overview](/sdk/architecture/workspace/overview) for comparison. - -## 4. Agent Server Package - -HTTP server for centralized agent execution. - -**Capabilities:** -- Remote agent execution via API -- Multi-user isolation -- Container management -- Resource allocation - -See [Agent Server Documentation](/sdk/architecture/agent_server/overview). - -## Component Interaction - -```mermaid -graph LR - User[User] -->|Message| Conv[Conversation] - Conv -->|Manages| Agent[Agent] - - Agent -->|Reasons with| LLM[LLM] - Agent -->|Executes| Tools[Tools] - Agent -->|Guided by| Micro[Microagents] - - Tools -->|Run in| Workspace[Workspace] - - style User fill:#e1f5fe - style Conv fill:#fff3e0 - style Agent fill:#f3e5f5 - style LLM fill:#e8f5e8 - style Tools fill:#fce4ec - style Workspace fill:#e0f2f1 -``` - -## Design Principles - -### Immutability & Serialization -All core classes are: -- **Immutable**: State changes create new instances -- **Serializable**: Full conversation state can be saved/restored -- **Type-safe**: Pydantic models ensure data integrity - -### Modularity -- **Composable**: Mix and match components as needed -- **Extensible**: Add custom tools, LLMs, or workspaces -- **Testable**: Each component can be tested in isolation - -### Backward Compatibility -- **Semantic versioning** indicates compatibility levels -- **Migration guides** provided for major changes - -## Getting Started - -New to the SDK? Start with the guides: - -- **[Getting Started](/sdk/guides/getting-started)** - Quick introduction -- **[Streaming Mode](/sdk/guides/streaming-mode)** - Execution patterns -- **[Tools & MCP](/sdk/guides/tools-and-mcp)** - Extending capabilities -- **[Workspaces](/sdk/guides/workspaces)** - Execution environments -- **[Sub-agents](/sdk/guides/subagents)** - Agent delegation - -## Deep Dive - -Explore individual components: - -- **SDK Package** - [Tool](/sdk/architecture/sdk/tool) | [Agent](/sdk/architecture/sdk/agent) | [LLM](/sdk/architecture/sdk/llm) | [Conversation](/sdk/architecture/sdk/conversation) -- **Tools Package** - [BashTool](/sdk/architecture/tools/bash) | [FileEditorTool](/sdk/architecture/tools/file_editor) -- **Workspace Package** - [DockerWorkspace](/sdk/architecture/workspace/docker) | [RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api) -- **Agent Server** - [Overview](/sdk/architecture/agent_server/overview) - -## Examples - -Browse the [`examples/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples) directory for practical implementations: - -- **Hello World** - Basic agent usage -- **Custom Tools** - Creating new capabilities -- **Docker Workspace** - Sandboxed execution -- **MCP Integration** - External tool servers -- **Planning Agent** - Multi-step workflows diff --git a/sdk/arch/sdk/agent.mdx b/sdk/arch/sdk/agent.mdx deleted file mode 100644 index 3c0da066..00000000 --- a/sdk/arch/sdk/agent.mdx +++ /dev/null @@ -1,301 +0,0 @@ ---- -title: Agent -description: Core orchestrator combining language models with tools to execute tasks through structured reasoning loops. ---- - -The Agent orchestrates LLM reasoning with tool execution to solve tasks. It manages the reasoning loop, system prompts, and state transitions while maintaining conversation context. - -**Source**: [`openhands/sdk/agent/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent) - -## Core Concepts - -```mermaid -graph TD - Agent[Agent] --> LLM[LLM] - Agent --> Tools[Tools] - Agent --> Context[AgentContext] - Agent --> Condenser[Condenser] - - Context --> Microagents[Microagents] - Tools --> Bash[BashTool] - Tools --> FileEditor[FileEditorTool] - Tools --> MCP[MCP Tools] - - style Agent fill:#e1f5fe - style LLM fill:#fff3e0 - style Tools fill:#e8f5e8 - style Context fill:#f3e5f5 -``` - -An agent combines: -- **LLM**: Language model for reasoning and decision-making -- **Tools**: Capabilities to interact with the environment -- **Context**: Additional knowledge and specialized expertise -- **Condenser**: Memory management for long conversations - -## Base Interface - -**Source**: [`openhands/sdk/agent/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/base.py) - -### AgentBase - -Abstract base class defining the agent interface: - -```python -from openhands.sdk.agent import AgentBase -from openhands.sdk.conversation import ConversationState - -class CustomAgent(AgentBase): - def step(self, state: ConversationState) -> ConversationState: - """Execute one reasoning step and return updated state.""" - # Your agent logic here - return updated_state -``` - -**Key Properties**: -- **Immutable**: Agents are frozen Pydantic models -- **Serializable**: Full agent configuration can be saved/restored -- **Type-safe**: Strict type checking with Pydantic validation - -## Agent Implementation - -**Source**: [`openhands/sdk/agent/agent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/agent.py) - -### Initialization Arguments - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[ - BashTool.create(), - FileEditorTool.create() - ], - mcp_config={}, # Optional MCP configuration - filter_tools_regex=None, # Optional regex to filter tools - agent_context=None, # Optional context with microagents - condenser=None, # Optional context condenser - security_analyzer=None, # Optional security analyzer - confirmation_policy=None, # Optional confirmation policy -) -``` - -### Key Parameters - -| Parameter | Type | Description | -|-----------|------|-------------| -| `llm` | `LLM` | Language model configuration (required) | -| `tools` | `list[Tool]` | Tools available to the agent | -| `mcp_config` | `dict` | MCP server configuration for external tools | -| `filter_tools_regex` | `str` | Regex to filter available tools | -| `agent_context` | `AgentContext` | Additional context and microagents | -| `condenser` | `CondenserBase` | Context condensation strategy | -| `security_analyzer` | `SecurityAnalyzer` | Security risk analysis | -| `confirmation_policy` | `ConfirmationPolicy` | Action confirmation strategy | - -## Agent Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant LLM - participant Tools - - User->>Conversation: Start conversation - Conversation->>Agent: Initialize state - loop Until task complete - Conversation->>Agent: step(state) - Agent->>LLM: Generate response - LLM->>Agent: Tool calls + reasoning - Agent->>Tools: Execute actions - Tools->>Agent: Observations - Agent->>Conversation: Updated state - end - Conversation->>User: Final result -``` - -### Execution Flow - -1. **Initialization**: Create agent with LLM and tools -2. **State Setup**: Pass agent to conversation -3. **Reasoning Loop**: Conversation calls `agent.step(state)` repeatedly -4. **Tool Execution**: Agent executes tool calls from LLM -5. **State Updates**: Agent returns updated conversation state -6. **Termination**: Loop ends when agent calls `FinishTool` - -## Usage Examples - -### Basic Agent - -See [`examples/01_standalone_sdk/01_hello_world.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py): - -```python -from openhands.sdk import Agent, LLM, Conversation -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create LLM -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -# Create agent -agent = Agent( - llm=llm, - tools=[ - BashTool.create(), - FileEditorTool.create() - ] -) - -# Use with conversation -conversation = Conversation(agent=agent) -await conversation.run(user_message="Your task here") -``` - -### Agent with Context - -See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py): - -```python -from openhands.sdk import Agent, AgentContext - -# Create context with microagents -context = AgentContext( - microagents=["testing_expert", "code_reviewer"] -) - -agent = Agent( - llm=llm, - tools=tools, - agent_context=context -) -``` - -### Agent with Memory Management - -See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): - -```python -from openhands.sdk.context import LLMCondenser - -condenser = LLMCondenser( - max_tokens=8000, - target_tokens=6000 -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -### Agent with MCP Tools - -See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py): - -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} - -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config -) -``` - -### Planning Agent Workflow - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) for a complete example of multi-phase agent workflows. - -## System Prompts - -**Source**: [`openhands/sdk/agent/prompts/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent/prompts) - -Agents use Jinja2 templates for system prompts. Available templates: - -| Template | Use Case | Source | -|----------|----------|--------| -| `system_prompt.j2` | Default reasoning and tool usage | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt.j2) | -| `system_prompt_interactive.j2` | Interactive conversations | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_interactive.j2) | -| `system_prompt_long_horizon.j2` | Complex multi-step tasks | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_long_horizon.j2) | -| `system_prompt_planning.j2` | Planning-focused workflows | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_planning.j2) | - -### Custom Prompts - -Create custom agent classes with specialized prompts: - -```python -class PlanningAgent(Agent): - system_prompt_filename: str = "system_prompt_planning.j2" -``` - -## Custom Agent Development - -### Extending AgentBase - -```python -from openhands.sdk.agent import AgentBase -from openhands.sdk.conversation import ConversationState - -class SpecializedAgent(AgentBase): - # Custom configuration - max_iterations: int = 10 - - def step(self, state: ConversationState) -> ConversationState: - # Custom reasoning logic - # Tool selection and execution - # State management - return updated_state -``` - -### Multi-Agent Composition - -```python -class WorkflowAgent(AgentBase): - planning_agent: Agent - execution_agent: Agent - - def step(self, state: ConversationState) -> ConversationState: - # Phase 1: Planning - plan = self.planning_agent.step(state) - - # Phase 2: Execution - result = self.execution_agent.step(plan) - - return result -``` - -## Best Practices - -1. **Tool Selection**: Provide only necessary tools to reduce complexity -2. **Clear Instructions**: Use detailed user messages for better task understanding -3. **Context Management**: Use condensers for long-running conversations -4. **Error Handling**: Implement proper error recovery strategies -5. **Security**: Use confirmation policies for sensitive operations -6. **Testing**: Test agents with various scenarios and edge cases - -## See Also - -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Defining and using tools -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing agent conversations -- **[LLM](/sdk/architecture/sdk/llm.mdx)** - Language model configuration -- **[MCP](/sdk/architecture/sdk/mcp.mdx)** - External tool integration -- **[Security](/sdk/architecture/sdk/security.mdx)** - Security and confirmation policies diff --git a/sdk/arch/sdk/condenser.mdx b/sdk/arch/sdk/condenser.mdx deleted file mode 100644 index 59d59da6..00000000 --- a/sdk/arch/sdk/condenser.mdx +++ /dev/null @@ -1,166 +0,0 @@ ---- -title: Context Condenser -description: Manage agent memory by intelligently compressing conversation history when approaching token limits. ---- - -The context condenser manages agent memory by intelligently compressing conversation history when approaching token limits. This enables agents to maintain coherent context in long-running conversations without exceeding LLM context windows. - -**Source**: [`openhands/sdk/context/condenser/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/condenser) - -## Why Context Condensation? - -```mermaid -graph LR - A[Long Conversation] --> B{Token Limit?} - B -->|Approaching| C[Condense] - B -->|Within Limit| D[Continue] - C --> E[Compressed Context] - E --> F[Agent with Memory] - D --> F - - style A fill:#e1f5fe - style C fill:#fff3e0 - style E fill:#e8f5e8 - style F fill:#f3e5f5 -``` - -As conversations grow, they may exceed LLM context windows. Condensers solve this by: -- Summarizing older messages while preserving key information -- Maintaining recent context in full detail -- Reducing token count without losing conversation coherence - -## LLM Condenser (Default) - -**Source**: [`openhands/sdk/context/condenser/llm_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/condenser/llm_condenser.py) - -The default condenser uses an LLM to intelligently summarize conversation history. - -### How It Works - -1. **Monitor Token Count**: Tracks conversation token usage -2. **Trigger Condensation**: Activates when approaching token threshold -3. **Summarize History**: Uses LLM to compress older messages -4. **Preserve Recent**: Keeps recent messages uncompressed -5. **Update Context**: Replaces verbose history with summary - -### Configuration - -```python -from openhands.sdk.context import LLMCondenser - -condenser = LLMCondenser( - max_tokens=8000, # Trigger condensation at this limit - target_tokens=6000, # Reduce to this token count - preserve_recent=10 # Keep last N messages uncompressed -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -### Example Usage - -See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): - -```python -from openhands.sdk import Agent, LLM -from openhands.sdk.context import LLMCondenser -from pydantic import SecretStr - -# Configure condenser -condenser = LLMCondenser( - max_tokens=8000, - target_tokens=6000 -) - -# Create agent with condenser -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -## Condensation Strategy - -### Multi-Phase Approach - -```mermaid -sequenceDiagram - participant Agent - participant Condenser - participant LLM - - Agent->>Condenser: Check token count - Condenser->>Condenser: Exceeds threshold? - Condenser->>LLM: Summarize old messages - LLM->>Condenser: Summary - Condenser->>Agent: Updated context - Agent->>Agent: Continue with condensed history -``` - -### What Gets Condensed - -- **System messages**: Preserved as-is -- **Recent messages**: Kept in full (configurable count) -- **Older messages**: Summarized into compact form -- **Tool results**: Preserved for reference -- **User preferences**: Maintained across condensation - -## Custom Condensers - -Implement custom condensation strategies by extending the base class: - -```python -from openhands.sdk.context import CondenserBase -from openhands.sdk.event import ConversationState - -class CustomCondenser(CondenserBase): - def condense(self, state: ConversationState) -> ConversationState: - """Implement custom condensation logic.""" - # Your condensation algorithm - return condensed_state - - def should_condense(self, state: ConversationState) -> bool: - """Determine when to trigger condensation.""" - # Your trigger logic - return token_count > threshold -``` - -## Best Practices - -1. **Set Appropriate Thresholds**: Leave buffer room below actual limit -2. **Preserve Recent Context**: Keep enough messages for coherent flow -3. **Monitor Performance**: Track condensation frequency and effectiveness -4. **Test Condensation**: Verify important information isn't lost -5. **Adjust Per Use Case**: Different tasks need different settings - -## Configuration Guidelines - -| Use Case | max_tokens | target_tokens | preserve_recent | -|----------|-----------|---------------|-----------------| -| Short tasks | 4000 | 3000 | 5 | -| Medium conversations | 8000 | 6000 | 10 | -| Long-running agents | 16000 | 12000 | 20 | -| Code-heavy tasks | 12000 | 10000 | 15 | - -## Performance Considerations - -- **Condensation Cost**: Uses additional LLM calls -- **Latency**: Brief pause during condensation -- **Context Quality**: Trade-off between compression and information retention -- **Frequency**: Tune thresholds to minimize condensation events - -## See Also - -- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using condensers with agents -- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py)** - Working example -- **[Conversation State](/sdk/architecture/sdk/conversation.mdx)** - Managing conversation state diff --git a/sdk/arch/sdk/conversation.mdx b/sdk/arch/sdk/conversation.mdx deleted file mode 100644 index e702fb36..00000000 --- a/sdk/arch/sdk/conversation.mdx +++ /dev/null @@ -1,487 +0,0 @@ ---- -title: Conversation -description: Manage agent lifecycles through structured message flows and state persistence. ---- - -The Conversation class orchestrates agent execution through structured message flows. It manages the agent lifecycle, state persistence, and provides APIs for interaction and monitoring. - -**Source**: [`openhands/sdk/conversation/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/conversation) - -## Core Concepts - -```mermaid -graph LR - User[User] --> Conversation[Conversation] - Conversation --> Agent[Agent] - Conversation --> State[ConversationState] - Conversation --> Events[Event History] - - Agent --> Step[step()] - State --> Persistence[Persistence] - - style Conversation fill:#e1f5fe - style Agent fill:#f3e5f5 - style State fill:#fff3e0 - style Events fill:#e8f5e8 -``` - -A conversation: -- **Manages Agent Lifecycle**: Initializes and runs agents until completion -- **Handles State**: Maintains conversation history and context -- **Enables Interaction**: Send messages and receive responses -- **Provides Persistence**: Save and restore conversation state -- **Monitors Progress**: Track execution stats and events - -## Basic API - -**Source**: [`openhands/sdk/conversation/conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation.py) - -### Creating a Conversation - -```python -from openhands.sdk import Conversation, Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create agent -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Create conversation -conversation = Conversation( - agent=agent, - workspace="workspace/project", # Working directory - persistence_dir="conversations", # Save conversation state - max_iteration_per_run=500, # Max steps per run - stuck_detection=True, # Detect infinite loops - visualize=True # Generate execution visualizations -) -``` - -### Constructor Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `agent` | `AgentBase` | *Required* | Agent to run in the conversation | -| `workspace` | `str \| LocalWorkspace \| RemoteWorkspace` | `"workspace/project"` | Execution environment | -| `persistence_dir` | `str \| None` | `None` | Directory for saving state | -| `conversation_id` | `ConversationID \| None` | `None` | Resume existing conversation | -| `callbacks` | `list[ConversationCallbackType] \| None` | `None` | Event callbacks | -| `max_iteration_per_run` | `int` | `500` | Maximum steps per `run()` call | -| `stuck_detection` | `bool` | `True` | Enable stuck detection | -| `visualize` | `bool` | `True` | Generate visualizations | -| `secrets` | `dict \| None` | `None` | Secret values for agent | - -## Agent Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant State - - User->>Conversation: Create conversation(agent) - Conversation->>State: Initialize state - Conversation->>Agent: init_state() - - User->>Conversation: send_message("Task") - Conversation->>State: Add message event - - User->>Conversation: run() - loop Until agent finishes or max iterations - Conversation->>Agent: step(state) - Agent->>State: Update with actions/observations - Conversation->>User: Callback with events - end - - User->>Conversation: agent_final_response() - Conversation->>User: Return final result -``` - -### 1. Create Agent - -Define agent with LLM and tools: - -```python -agent = Agent(llm=llm, tools=tools) -``` - -### 2. Create Conversation - -Pass agent to conversation: - -```python -conversation = Conversation(agent=agent) -``` - -### 3. Send Messages - -Add user messages to conversation: - -```python -conversation.send_message("Build a web scraper for news articles") -``` - -### 4. Run Agent - -Execute agent until task completion: - -```python -conversation.run() -``` - -The conversation will call `agent.step(state)` repeatedly until: -- Agent calls `FinishTool` -- Maximum iterations reached -- Agent encounters an error -- User pauses execution - -### 5. Get Results - -Retrieve agent's final response: - -```python -result = conversation.agent_final_response() -print(result) -``` - -## Core Methods - -**Source**: [`openhands/sdk/conversation/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/base.py) - -### send_message() - -Add a message to the conversation: - -```python -# String message -conversation.send_message("Write unit tests for the API") - -# Message object with images -from openhands.sdk.llm import Message, ImageContent - -message = Message( - role="user", - content=[ - "What's in this image?", - ImageContent(source="path/to/image.png") - ] -) -conversation.send_message(message) -``` - -See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). - -### run() - -Execute the agent until completion or max iterations: - -```python -# Synchronous execution -conversation.run() - -# Async execution -await conversation.run() -``` - -See [`examples/01_standalone_sdk/11_async.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) for async usage. - -### agent_final_response() - -Get the agent's final response: - -```python -final_response = conversation.agent_final_response() -``` - -### pause() - -Pause agent execution: - -```python -conversation.pause() -``` - -See [`examples/01_standalone_sdk/09_pause_example.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py). - -### close() - -Clean up resources: - -```python -conversation.close() -``` - -## Conversation State - -**Source**: [`openhands/sdk/conversation/state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/state.py) - -### Accessing State - -```python -state = conversation.state - -# Conversation properties -print(state.id) # Unique conversation ID -print(state.agent_status) # Current execution status -print(state.events) # Event history - -# Agent and workspace -print(state.agent) # The agent instance -print(state.workspace) # The workspace -``` - -### Agent Execution Status - -```python -from openhands.sdk.conversation.state import AgentExecutionStatus - -status = state.agent_status - -# Possible values: -# - AgentExecutionStatus.IDLE -# - AgentExecutionStatus.RUNNING -# - AgentExecutionStatus.FINISHED -# - AgentExecutionStatus.ERROR -# - AgentExecutionStatus.PAUSED -``` - -## Persistence - -### Saving Conversations - -Conversations are automatically persisted when `persistence_dir` is set: - -```python -conversation = Conversation( - agent=agent, - persistence_dir="conversations" # Saves to conversations// -) -``` - -See [`examples/01_standalone_sdk/10_persistence.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py). - -### Resuming Conversations - -Resume from a saved conversation ID: - -```python -from openhands.sdk.conversation.types import ConversationID - -# Get conversation ID -conv_id = conversation.id - -# Later, resume with the same ID -resumed_conversation = Conversation( - agent=agent, - conversation_id=conv_id, - persistence_dir="conversations" -) -``` - -## Monitoring and Stats - -**Source**: [`openhands/sdk/conversation/conversation_stats.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation_stats.py) - -### Conversation Stats - -```python -stats = conversation.conversation_stats - -print(stats.total_messages) # Total messages exchanged -print(stats.total_tokens) # Total tokens used -print(stats.total_cost) # Estimated cost -print(stats.duration) # Execution time -``` - -See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). - -## Event Callbacks - -### Registering Callbacks - -Monitor conversation events in real-time: - -```python -from openhands.sdk.conversation import ConversationCallbackType -from openhands.sdk.event import Event - -def on_event(event: Event): - if isinstance(event, MessageEvent): - print(f"Message: {event.content}") - elif isinstance(event, ActionEvent): - print(f"Action: {event.action.kind}") - elif isinstance(event, ObservationEvent): - print(f"Observation: {event.observation.kind}") - -conversation = Conversation( - agent=agent, - callbacks=[on_event] -) -``` - -## Advanced Features - -### Stuck Detection - -**Source**: [`openhands/sdk/conversation/stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/stuck_detector.py) - -Automatically detects when agents are stuck in loops: - -```python -conversation = Conversation( - agent=agent, - stuck_detection=True # Default: True -) -``` - -See [`examples/01_standalone_sdk/20_stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py). - -### Secrets Management - -**Source**: [`openhands/sdk/conversation/secrets_manager.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/secrets_manager.py) - -Provide secrets for agent operations: - -```python -conversation = Conversation( - agent=agent, - secrets={ - "API_KEY": "secret-value", - "DATABASE_URL": "postgres://..." - } -) - -# Update secrets during execution -conversation.update_secrets({ - "NEW_TOKEN": "new-value" -}) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -### Visualization - -**Source**: [`openhands/sdk/conversation/visualizer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/visualizer.py) - -Generate execution visualizations: - -```python -conversation = Conversation( - agent=agent, - visualize=True # Default: True -) - -# Visualizations saved to workspace/visualizations/ -``` - -### Title Generation - -Generate conversation titles: - -```python -title = conversation.generate_title(max_length=50) -print(f"Conversation: {title}") -``` - -## Local vs Remote Conversations - -### LocalConversation - -**Source**: [`openhands/sdk/conversation/impl/local_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/local_conversation.py) - -Runs agent locally: - -```python -from openhands.sdk.workspace import LocalWorkspace - -conversation = Conversation( - agent=agent, - workspace=LocalWorkspace(working_dir="/project") -) -``` - -### RemoteConversation - -**Source**: [`openhands/sdk/conversation/impl/remote_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/remote_conversation.py) - -Runs agent on remote server: - -```python -from openhands.workspace import RemoteAPIWorkspace - -conversation = Conversation( - agent=agent, - workspace=RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com" - ) -) -``` - -## Best Practices - -1. **Set Appropriate Iteration Limits**: Prevent runaway executions -2. **Use Persistence**: Save important conversations for resume/replay -3. **Monitor Events**: Use callbacks for real-time monitoring -4. **Handle Errors**: Check agent status and handle failures gracefully -5. **Clean Up Resources**: Call `close()` when done -6. **Enable Stuck Detection**: Catch infinite loops early -7. **Track Stats**: Monitor token usage and costs - -## Complete Example - -```python -from openhands.sdk import Conversation, Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create agent -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Create conversation -conversation = Conversation( - agent=agent, - workspace="workspace/project", - persistence_dir="conversations", - max_iteration_per_run=100 -) - -try: - # Send task - conversation.send_message("Create a simple REST API") - - # Run agent - conversation.run() - - # Get result - result = conversation.agent_final_response() - print(f"Result: {result}") - - # Check stats - stats = conversation.conversation_stats - print(f"Tokens used: {stats.total_tokens}") - print(f"Cost: ${stats.total_cost}") -finally: - # Clean up - conversation.close() -``` - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration and usage -- **[Events](/sdk/architecture/sdk/event.mdx)** - Event types and handling -- **[Workspace](/sdk/architecture/sdk/workspace.mdx)** - Workspace configuration -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - Usage examples diff --git a/sdk/arch/sdk/event.mdx b/sdk/arch/sdk/event.mdx deleted file mode 100644 index a286dab0..00000000 --- a/sdk/arch/sdk/event.mdx +++ /dev/null @@ -1,403 +0,0 @@ ---- -title: Event System -description: Structured event types representing agent actions, observations, and system messages in conversations. ---- - -The event system provides structured representations of all interactions in agent conversations. Events enable state management, LLM communication, and real-time monitoring. - -**Source**: [`openhands/sdk/event/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/event) - -## Core Concepts - -```mermaid -graph TD - Event[Event] --> LLMConvertible[LLMConvertibleEvent] - Event --> NonConvertible[Non-LLM Events] - - LLMConvertible --> Action[ActionEvent] - LLMConvertible --> Observation[ObservationEvent] - LLMConvertible --> Message[MessageEvent] - LLMConvertible --> System[SystemPromptEvent] - - NonConvertible --> State[StateUpdateEvent] - NonConvertible --> User[UserActionEvent] - NonConvertible --> Condenser[CondenserEvent] - - style Event fill:#e1f5fe - style LLMConvertible fill:#fff3e0 - style NonConvertible fill:#e8f5e8 -``` - -Events fall into two categories: -- **LLMConvertibleEvent**: Events that become LLM messages -- **Non-LLM Events**: Internal state and control events - -## Base Event Classes - -**Source**: [`openhands/sdk/event/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/base.py) - -### Event - -Base class for all events: - -```python -from openhands.sdk.event import Event - -class Event: - id: str # Unique event identifier - timestamp: str # ISO format timestamp - source: SourceType # Event source (agent/user/system) -``` - -**Properties**: -- **Immutable**: Events are frozen Pydantic models -- **Serializable**: Full event data can be saved/restored -- **Visualizable**: Rich text representation for display - -### LLMConvertibleEvent - -Events that can be converted to LLM messages: - -```python -from openhands.sdk.event import LLMConvertibleEvent -from openhands.sdk.llm import Message - -class LLMConvertibleEvent(Event): - def to_llm_message(self) -> Message: - """Convert event to LLM message format.""" - ... -``` - -These events form the conversation history sent to the LLM. - -## LLM-Convertible Events - -### ActionEvent - -**Source**: [`openhands/sdk/event/llm_convertible/action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/action.py) - -Represents actions taken by the agent: - -```python -from openhands.sdk.event import ActionEvent -from openhands.sdk.tool import Action - -class ActionEvent(LLMConvertibleEvent): - action: Action # The action being executed - thought: str # Agent's reasoning (optional) -``` - -**Purpose**: Records what the agent decided to do. - -**Example**: -```python -from openhands.tools import BashAction - -action_event = ActionEvent( - source="agent", - action=BashAction(command="ls -la"), - thought="List files to understand directory structure" -) -``` - -### ObservationEvent - -**Source**: [`openhands/sdk/event/llm_convertible/observation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/observation.py) - -Represents observations from tool execution: - -```python -from openhands.sdk.event import ObservationEvent -from openhands.sdk.tool import Observation - -class ObservationEvent(LLMConvertibleEvent): - observation: Observation # Tool execution result -``` - -**Purpose**: Records the outcome of agent actions. - -**Example**: -```python -from openhands.tools import BashObservation - -observation_event = ObservationEvent( - source="tool", - observation=BashObservation( - output="file1.txt\nfile2.py\n", - exit_code=0 - ) -) -``` - -**Related Events**: -- **AgentErrorEvent**: Agent execution errors -- **UserRejectObservation**: User rejected an action - -### MessageEvent - -**Source**: [`openhands/sdk/event/llm_convertible/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/message.py) - -Represents messages in the conversation: - -```python -from openhands.sdk.event import MessageEvent - -class MessageEvent(LLMConvertibleEvent): - content: str | list # Message content (text or multimodal) - role: str # Role: "user", "assistant", "system" - images_urls: list[str] # Optional image URLs -``` - -**Purpose**: User messages, agent responses, and system messages. - -**Example**: -```python -message_event = MessageEvent( - source="user", - content="Create a web scraper", - role="user" -) -``` - -### SystemPromptEvent - -**Source**: [`openhands/sdk/event/llm_convertible/system.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/system.py) - -Represents system prompts: - -```python -from openhands.sdk.event import SystemPromptEvent - -class SystemPromptEvent(LLMConvertibleEvent): - content: str # System prompt content -``` - -**Purpose**: Provides instructions and context to the agent. - -## Non-LLM Events - -### ConversationStateUpdateEvent - -**Source**: [`openhands/sdk/event/conversation_state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/conversation_state.py) - -Tracks conversation state changes: - -```python -from openhands.sdk.event import ConversationStateUpdateEvent - -class ConversationStateUpdateEvent(Event): - # Internal state update event - # Not sent to LLM -``` - -**Purpose**: Internal tracking of conversation state transitions. - -### PauseEvent - -**Source**: [`openhands/sdk/event/user_action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/user_action.py) - -User paused the conversation: - -```python -from openhands.sdk.event import PauseEvent - -class PauseEvent(Event): - pass -``` - -**Purpose**: Signal that user has paused agent execution. - -### Condenser Events - -**Source**: [`openhands/sdk/event/condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/condenser.py) - -Track context condensation: - -#### Condensation - -```python -class Condensation(Event): - content: str # Condensed summary -``` - -**Purpose**: Record the condensed conversation history. - -#### CondensationRequest - -```python -class CondensationRequest(Event): - pass -``` - -**Purpose**: Request context condensation. - -#### CondensationSummaryEvent - -```python -class CondensationSummaryEvent(LLMConvertibleEvent): - content: str # Summary for LLM -``` - -**Purpose**: Provide condensed context to LLM. - -## Event Source Types - -**Source**: [`openhands/sdk/event/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/types.py) - -```python -SourceType = Literal["agent", "user", "tool", "system"] -``` - -- **agent**: Events from the agent -- **user**: Events from the user -- **tool**: Events from tool execution -- **system**: System-generated events - -## Event Streams - -### Converting to LLM Messages - -Events are converted to LLM messages for context: - -```python -from openhands.sdk.event import LLMConvertibleEvent - -events = [action_event, observation_event, message_event] -messages = LLMConvertibleEvent.events_to_messages(events) - -# Send to LLM -response = llm.completion(messages=messages) -``` - -### Event Batching - -Multiple actions in a single step are batched: - -```python -# Multi-action events -action1 = ActionEvent(action=BashAction(...)) -action2 = ActionEvent(action=FileEditAction(...)) - -# Converted to single LLM message with multiple tool calls -messages = LLMConvertibleEvent.events_to_messages([action1, action2]) -``` - -## Event Visualization - -Events support rich text visualization: - -```python -from openhands.sdk.event import Event - -event = MessageEvent( - source="user", - content="Hello", - role="user" -) - -# Rich text representation -print(event.visualize) - -# Plain text -print(str(event)) -# Output: MessageEvent (user) -# user: Hello -``` - -## Event Callbacks - -Monitor events in real-time: - -```python -from openhands.sdk import Conversation -from openhands.sdk.event import ( - Event, - ActionEvent, - ObservationEvent, - MessageEvent -) - -def on_event(event: Event): - if isinstance(event, MessageEvent): - print(f"šŸ’¬ Message: {event.content}") - elif isinstance(event, ActionEvent): - print(f"šŸ”§ Action: {event.action.kind}") - elif isinstance(event, ObservationEvent): - print(f"šŸ‘ļø Observation: {event.observation.content}") - -conversation = Conversation( - agent=agent, - callbacks=[on_event] -) -``` - -## Event History - -Access conversation event history: - -```python -conversation = Conversation(agent=agent) -conversation.send_message("Task") -conversation.run() - -# Get all events -events = conversation.state.events - -# Filter by type -actions = [e for e in events if isinstance(e, ActionEvent)] -observations = [e for e in events if isinstance(e, ObservationEvent)] -messages = [e for e in events if isinstance(e, MessageEvent)] -``` - -## Serialization - -Events are fully serializable: - -```python -# Serialize event -event_json = event.model_dump_json() - -# Deserialize -from openhands.sdk.event import Event -restored_event = Event.model_validate_json(event_json) -``` - -## Best Practices - -1. **Use Type Guards**: Check event types with `isinstance()` -2. **Handle All Types**: Cover all event types in callbacks -3. **Preserve Immutability**: Never mutate event objects -4. **Log Events**: Keep event history for debugging -5. **Filter Strategically**: Process only relevant events -6. **Visualize for Debugging**: Use `event.visualize` for rich output - -## Event Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant Events - - User->>Conversation: send_message() - Conversation->>Events: MessageEvent - - Conversation->>Agent: step() - Agent->>Events: ActionEvent(s) - - Agent->>Tool: Execute - Tool->>Events: ObservationEvent(s) - - Events->>LLM: Convert to messages - LLM->>Agent: Generate response - - Agent->>Events: New ActionEvent(s) -``` - -## See Also - -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing conversations and event streams -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent execution and event generation -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool actions and observations -- **[Condenser](/sdk/architecture/sdk/condenser.mdx)** - Context condensation events diff --git a/sdk/arch/sdk/llm.mdx b/sdk/arch/sdk/llm.mdx deleted file mode 100644 index 3a418d92..00000000 --- a/sdk/arch/sdk/llm.mdx +++ /dev/null @@ -1,416 +0,0 @@ ---- -title: LLM Integration -description: Language model integration supporting multiple providers through LiteLLM with built-in retry logic and metrics tracking. ---- - -The LLM class provides a unified interface for language model integration, supporting multiple providers through [LiteLLM](https://docs.litellm.ai/). It handles authentication, retries, metrics tracking, and streaming responses. - -**Source**: [`openhands/sdk/llm/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/llm) - -## Core Concepts - -```mermaid -graph LR - LLM[LLM] --> Completion[completion()] - LLM --> Metrics[Metrics Tracking] - LLM --> Retry[Retry Logic] - - Completion --> Provider[Provider API] - Provider --> OpenAI[OpenAI] - Provider --> Anthropic[Anthropic] - Provider --> Others[Other Providers] - - style LLM fill:#e1f5fe - style Completion fill:#fff3e0 - style Metrics fill:#e8f5e8 - style Retry fill:#f3e5f5 -``` - -## Basic Usage - -**Source**: [`openhands/sdk/llm/llm.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/llm.py) - -### Creating an LLM - -```python -from openhands.sdk import LLM -from pydantic import SecretStr - -# Basic configuration -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -# With custom settings -llm = LLM( - model="openai/gpt-4", - api_key=SecretStr("your-api-key"), - base_url="https://api.openai.com/v1", - temperature=0.7, - max_tokens=4096, - timeout=60.0 -) -``` - -### Configuration Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `model` | `str` | `"claude-sonnet-4-20250514"` | Model identifier | -| `api_key` | `SecretStr \| None` | `None` | API key for authentication | -| `base_url` | `str \| None` | `None` | Custom API endpoint | -| `temperature` | `float` | `0.0` | Sampling temperature (0-2) | -| `max_tokens` | `int \| None` | `None` | Maximum tokens to generate | -| `timeout` | `float` | `60.0` | Request timeout in seconds | -| `num_retries` | `int` | `8` | Number of retry attempts | -| `retry_min_wait` | `int` | `3` | Minimum retry wait (seconds) | -| `retry_max_wait` | `int` | `60` | Maximum retry wait (seconds) | -| `retry_multiplier` | `float` | `2.0` | Retry backoff multiplier | - -## Generating Completions - -### Basic Completion - -```python -from openhands.sdk.llm import Message - -messages = [ - Message(role="user", content="What is the capital of France?") -] - -response = llm.completion(messages=messages) -print(response.choices[0].message.content) -# Output: "The capital of France is Paris." -``` - -### With Tool Calling - -```python -from openhands.sdk import Agent -from openhands.tools import BashTool - -# Tools are automatically converted to function schemas -agent = Agent( - llm=llm, - tools=[BashTool.create()] -) - -# LLM receives tool schemas and can call them -``` - -### Streaming Responses - -```python -# Enable streaming -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - stream=True -) - -# Stream response chunks -for chunk in llm.completion(messages=messages): - if chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") -``` - -## Model Providers - -The SDK supports all providers available in LiteLLM: - -### Anthropic - -```python -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("sk-ant-...") -) -``` - -### OpenAI - -```python -llm = LLM( - model="openai/gpt-4", - api_key=SecretStr("sk-...") -) -``` - -### Azure OpenAI - -```python -llm = LLM( - model="azure/gpt-4", - api_key=SecretStr("your-azure-key"), - api_base="https://your-resource.openai.azure.com", - api_version="2024-02-01" -) -``` - -### Custom Providers - -```python -llm = LLM( - model="custom-provider/model-name", - base_url="https://custom-api.example.com/v1", - api_key=SecretStr("your-api-key") -) -``` - -See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for full list. - -## LLM Registry - -**Source**: Use pre-configured LLM instances from registry. - -See [`examples/01_standalone_sdk/05_use_llm_registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py): - -```python -from openhands.sdk.llm.registry import get_llm - -# Get pre-configured LLM -llm = get_llm( - model_name="claude-sonnet-4", - # Configuration from environment or defaults -) -``` - -## Metrics and Monitoring - -### Tracking Metrics - -**Source**: [`openhands/sdk/llm/utils/metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/utils/metrics.py) - -```python -# Get metrics snapshot -metrics = llm.metrics.snapshot() - -print(f"Total tokens: {metrics.accumulated_cost}") -print(f"Total cost: ${metrics.accumulated_cost}") -print(f"Requests: {metrics.total_requests}") -``` - -See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). - -### Cost Tracking - -```python -from openhands.sdk.conversation import Conversation - -conversation = Conversation(agent=Agent(llm=llm, tools=tools)) -conversation.send_message("Task") -conversation.run() - -# Get conversation stats -stats = conversation.conversation_stats -print(f"Total tokens: {stats.total_tokens}") -print(f"Estimated cost: ${stats.total_cost}") -``` - -## Advanced Features - -### LLM Routing - -**Source**: Route between different LLMs based on criteria. - -See [`examples/01_standalone_sdk/19_llm_routing.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py): - -```python -# Use different LLMs for different tasks -fast_llm = LLM(model="openai/gpt-4o-mini", api_key=SecretStr("...")) -powerful_llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key=SecretStr("...")) - -# Route based on task complexity -if task_is_simple: - agent = Agent(llm=fast_llm, tools=tools) -else: - agent = Agent(llm=powerful_llm, tools=tools) -``` - -### Model Reasoning - -**Source**: Access model reasoning from Anthropic thinking blocks and OpenAI responses API. - -See [`examples/01_standalone_sdk/22_model_reasoning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py): - -```python -# Enable Anthropic extended thinking -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - thinking={"type": "enabled", "budget_tokens": 1000} -) - -# Or use OpenAI responses API for reasoning -llm = LLM( - model="openai/gpt-5-codex", - api_key=SecretStr("your-api-key"), - reasoning_effort="high" -) -``` - -## Error Handling - -### Automatic Retries - -The LLM class automatically retries on transient failures: - -```python -from litellm.exceptions import RateLimitError, APIConnectionError - -# These exceptions trigger automatic retry: -# - APIConnectionError -# - RateLimitError -# - ServiceUnavailableError -# - Timeout -# - InternalServerError - -# Configure retry behavior -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - num_retries=8, # Number of retries - retry_min_wait=3, # Min wait between retries (seconds) - retry_max_wait=60, # Max wait between retries (seconds) - retry_multiplier=2.0 # Exponential backoff multiplier -) -``` - -### Exception Handling - -```python -from litellm.exceptions import ( - RateLimitError, - ContextWindowExceededError, - BadRequestError -) - -try: - response = llm.completion(messages=messages) -except RateLimitError: - print("Rate limit exceeded, automatic retry in progress") -except ContextWindowExceededError: - print("Context window exceeded, reduce message history") -except BadRequestError as e: - print(f"Bad request: {e}") -``` - -## Message Types - -**Source**: [`openhands/sdk/llm/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py) - -### Text Messages - -```python -from openhands.sdk.llm import Message - -message = Message( - role="user", - content="Hello, how are you?" -) -``` - -### Multimodal Messages - -```python -from openhands.sdk.llm import Message, ImageContent - -message = Message( - role="user", - content=[ - "What's in this image?", - ImageContent(source="path/to/image.png") - ] -) -``` - -See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). - -### Tool Call Messages - -```python -from openhands.sdk.llm import Message, MessageToolCall - -# Message with tool calls -message = Message( - role="assistant", - content="Let me run that command", - tool_calls=[ - MessageToolCall( - id="call_123", - function={"name": "execute_bash", "arguments": '{"command": "ls"}'} - ) - ] -) -``` - -## Model Features - -### Vision Support - -```python -from litellm.utils import supports_vision - -if supports_vision(llm.model): - # Model supports image inputs - message = Message( - role="user", - content=["Describe this image", ImageContent(source="image.png")] - ) -``` - -### Token Counting - -```python -from litellm.utils import token_counter - -# Count tokens in messages -messages = [Message(role="user", content="Hello world")] -tokens = token_counter(model=llm.model, messages=messages) -print(f"Message uses {tokens} tokens") -``` - -### Model Information - -```python -from litellm.utils import get_model_info - -info = get_model_info(llm.model) -print(f"Max tokens: {info['max_tokens']}") -print(f"Cost per token: {info['input_cost_per_token']}") -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: Adjust based on expected response time -2. **Configure Retries**: Balance reliability with latency requirements -3. **Monitor Metrics**: Track token usage and costs -4. **Handle Exceptions**: Implement proper error handling -5. **Use Streaming**: For better user experience with long responses -6. **Secure API Keys**: Use `SecretStr` and environment variables -7. **Choose Right Model**: Balance cost, speed, and capability - -## Environment Variables - -Configure LLM via environment variables: - -```bash -# API keys -export ANTHROPIC_API_KEY="sk-ant-..." -export OPENAI_API_KEY="sk-..." -export AZURE_API_KEY="..." - -# Custom endpoints -export OPENAI_API_BASE="https://custom-endpoint.com" - -# Model defaults -export LLM_MODEL="anthropic/claude-sonnet-4-20250514" -``` - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Using LLMs with agents -- **[Message Types](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py)** - Message structure -- **[LiteLLM Documentation](https://docs.litellm.ai/)** - Provider details -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - LLM usage examples diff --git a/sdk/arch/sdk/mcp.mdx b/sdk/arch/sdk/mcp.mdx deleted file mode 100644 index ea18a670..00000000 --- a/sdk/arch/sdk/mcp.mdx +++ /dev/null @@ -1,333 +0,0 @@ ---- -title: MCP Integration -description: Connect agents to external tools and services through the Model Context Protocol. ---- - -MCP (Model Context Protocol) integration enables agents to connect to external tools and services through a standardized protocol. The SDK seamlessly converts MCP tools into native agent tools. - -**Source**: [`openhands/sdk/mcp/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/mcp) - -## What is MCP? - -[Model Context Protocol](https://modelcontextprotocol.io/) is an open protocol that standardizes how AI applications connect to external data sources and tools. It enables: - -- **Standardized Integration**: Connect to any MCP-compliant service -- **Dynamic Discovery**: Tools are discovered at runtime -- **Multiple Transports**: Support for stdio, HTTP, and SSE -- **OAuth Support**: Secure authentication for external services - -## Basic Usage - -### Creating MCP Tools - -```python -from openhands.sdk import create_mcp_tools - -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} - -# Create MCP tools -mcp_tools = create_mcp_tools(mcp_config, timeout=30) - -# Use with agent -from openhands.sdk import Agent -from openhands.tools import BashTool - -agent = Agent( - llm=llm, - tools=[ - BashTool.create(), - *mcp_tools # Add MCP tools - ] -) -``` - -See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py). - -### Using MCP Config in Agent - -```python -# Simpler: provide MCP config directly to agent -agent = Agent( - llm=llm, - tools=[BashTool.create()], - mcp_config={ - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } - } -) -``` - -## Configuration Formats - -The SDK uses the [FastMCP configuration format](https://gofastmcp.com/clients/client#configuration-format). - -### Stdio Servers - -Run local MCP servers via stdio: - -```python -mcp_config = { - "mcpServers": { - "filesystem": { - "transport": "stdio", # Optional, default - "command": "python", - "args": ["./mcp-server-filesystem.py"], - "env": {"DEBUG": "true"}, - "cwd": "/path/to/server" - } - } -} -``` - -### HTTP/SSE Servers - -Connect to remote MCP servers: - -```python -mcp_config = { - "mcpServers": { - "remote_api": { - "transport": "http", # or "sse" - "url": "https://api.example.com/mcp", - "headers": { - "Authorization": "Bearer token" - } - } - } -} -``` - -### OAuth Authentication - -Authenticate with OAuth-enabled services: - -```python -mcp_config = { - "mcpServers": { - "google_drive": { - "transport": "http", - "url": "https://mcp.google.com/drive", - "auth": "oauth", # Enable OAuth flow - } - } -} -``` - -See [`examples/01_standalone_sdk/08_mcp_with_oauth.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py). - -## Available MCP Servers - -Popular MCP servers you can integrate: - -### Official Servers - -- **fetch**: HTTP requests ([mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)) -- **filesystem**: File operations ([mcp-server-filesystem](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem)) -- **git**: Git operations ([mcp-server-git](https://github.com/modelcontextprotocol/servers/tree/main/src/git)) -- **github**: GitHub API ([mcp-server-github](https://github.com/modelcontextprotocol/servers/tree/main/src/github)) -- **postgres**: PostgreSQL queries ([mcp-server-postgres](https://github.com/modelcontextprotocol/servers/tree/main/src/postgres)) - -### Community Servers - -See [MCP Servers Directory](https://github.com/modelcontextprotocol/servers) for more. - -## MCP Tool Conversion - -MCP tools are automatically converted to SDK tools: - -```mermaid -graph LR - MCPServer[MCP Server] --> Discovery[Tool Discovery] - Discovery --> Schema[Tool Schema] - Schema --> SDKTool[SDK Tool] - SDKTool --> Agent[Agent] - - style MCPServer fill:#e1f5fe - style SDKTool fill:#fff3e0 - style Agent fill:#e8f5e8 -``` - -1. **Discovery**: MCP server lists available tools -2. **Schema Extraction**: Tool schemas extracted from MCP -3. **Tool Creation**: SDK tools created with proper typing -4. **Agent Integration**: Tools available to agent - -## Configuration Options - -### Timeout - -Set connection timeout for MCP servers: - -```python -mcp_tools = create_mcp_tools(mcp_config, timeout=60) # 60 seconds -``` - -### Multiple Servers - -Configure multiple MCP servers: - -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "filesystem": { - "command": "uvx", - "args": ["mcp-server-filesystem"] - }, - "github": { - "command": "uvx", - "args": ["mcp-server-github"] - } - } -} -``` - -All tools from all servers are available to the agent. - -## Error Handling - -```python -try: - mcp_tools = create_mcp_tools(mcp_config, timeout=30) -except TimeoutError: - print("MCP server connection timed out") -except Exception as e: - print(f"Failed to create MCP tools: {e}") - mcp_tools = [] # Continue without MCP tools - -agent = Agent(llm=llm, tools=[*base_tools, *mcp_tools]) -``` - -## Tool Filtering - -Filter MCP tools using regex: - -```python -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - filter_tools_regex="^fetch_.*" # Only tools starting with "fetch_" -) -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: MCP servers may take time to initialize -2. **Handle Failures Gracefully**: Continue with reduced functionality if MCP fails -3. **Use Official Servers**: Start with well-tested MCP servers -4. **Secure Credentials**: Use environment variables for sensitive data -5. **Test Locally First**: Verify MCP servers work before deploying -6. **Monitor Performance**: MCP adds latency, monitor impact -7. **Version Pin**: Specify exact versions of MCP servers - -## Environment Variables - -Configure MCP servers via environment: - -```bash -# GitHub MCP server -export GITHUB_PERSONAL_ACCESS_TOKEN="ghp_..." - -# Google Drive OAuth -export GOOGLE_CLIENT_ID="..." -export GOOGLE_CLIENT_SECRET="..." - -# Custom MCP endpoints -export MCP_FETCH_URL="https://custom-mcp.example.com" -``` - -## Advanced Usage - -### Custom MCP Client - -For advanced control, use the MCP client directly: - -```python -from openhands.sdk.mcp.client import MCPClient - -# Create custom MCP client -client = MCPClient( - server_config={ - "command": "python", - "args": ["./custom-server.py"] - }, - timeout=60 -) - -# Get tools from client -tools = client.list_tools() - -# Use tools with agent -agent = Agent(llm=llm, tools=tools) -``` - -## Debugging - -### Enable Debug Logging - -```python -import logging - -logging.getLogger("openhands.sdk.mcp").setLevel(logging.DEBUG) -``` - -### Verify MCP Server - -Test MCP server independently: - -```bash -# Run MCP server directly -uvx mcp-server-fetch - -# Check if server responds -curl http://localhost:3000/mcp/tools -``` - -## Common Issues - -### Server Not Found - -```python -# Ensure server is installed -# For uvx-based servers: -uvx --help # Check if uvx is available -uvx mcp-server-fetch --help # Check if server is available -``` - -### Connection Timeout - -```python -# Increase timeout -mcp_tools = create_mcp_tools(mcp_config, timeout=120) -``` - -### OAuth Flow Issues - -```python -# Ensure OAuth credentials are configured -# Check browser opens for OAuth consent -# Verify redirect URL matches configuration -``` - -## See Also - -- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Official MCP documentation -- **[MCP Servers](https://github.com/modelcontextprotocol/servers)** - Official server implementations -- **[FastMCP](https://gofastmcp.com/)** - Configuration format documentation -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - SDK tool system -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py)** - MCP integration examples diff --git a/sdk/arch/sdk/microagents.mdx b/sdk/arch/sdk/microagents.mdx deleted file mode 100644 index 00c95dd8..00000000 --- a/sdk/arch/sdk/microagents.mdx +++ /dev/null @@ -1,225 +0,0 @@ ---- -title: Microagents -description: Specialized context providers that inject targeted knowledge into agent conversations. ---- - -Microagents are specialized context providers that inject targeted knowledge into agent conversations when specific triggers are detected. They enable modular, reusable expertise without modifying the main agent. - -## What are Microagents? - -Microagents provide focused knowledge or instructions that are dynamically added to the agent's context when relevant keywords are detected in the conversation. This allows agents to access specialized expertise on-demand. - -For a comprehensive guide on using microagents, see the [official microagents documentation](https://docs.all-hands.dev/usage/prompting/microagents-overview). - -**Source**: [`openhands/sdk/context/microagents/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/microagents) - -## Microagent Types - -**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) - -The SDK provides three types of microagents, each serving a distinct purpose: - -### 1. KnowledgeMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L162`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L162) - -Provides specialized expertise triggered by keywords in conversations. - -**Activation Logic:** -- Contains a list of trigger keywords -- Activated when any trigger appears in conversation -- Case-insensitive matching - -**Use Cases:** -- Language best practices (Python, JavaScript, etc.) -- Framework guidelines (React, Django, etc.) -- Common patterns and anti-patterns -- Tool usage instructions - -**Example:** -```python -from openhands.sdk.context.microagents import KnowledgeMicroagent - -microagent = KnowledgeMicroagent( - name="python_testing", - content="Always use pytest for Python tests...", - triggers=["pytest", "test", "unittest"] -) - -# Triggered when message contains "pytest", "test", or "unittest" -``` - -### 2. RepoMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L191`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L191) - -Repository-specific knowledge that's always active when working with a repository. - -**Activation Logic:** -- No triggers required -- Always loaded and active for the repository -- Can define MCP tools configuration - -**Use Cases:** -- Repository-specific guidelines -- Team practices and conventions -- Project-specific workflows -- Custom documentation references - -**Special Files:** -- `.openhands_instructions` - Legacy repo instructions -- `.cursorrules` - Cursor IDE rules (auto-loaded) -- `agents.md` / `agent.md` - Agent instructions (auto-loaded) - -**Example:** -```python -from openhands.sdk.context.microagents import RepoMicroagent - -microagent = RepoMicroagent( - name="project_guidelines", - content="This project uses...", - mcp_tools={"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]}} -) -``` - -### 3. TaskMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L236`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L236) - -Specialized KnowledgeMicroagent that requires user input before execution. - -**Activation Logic:** -- Triggered by `/{agent_name}` format -- Prompts user for required inputs -- Processes inputs before injecting knowledge - -**Use Cases:** -- Deployment procedures requiring credentials -- Multi-step workflows with parameters -- Interactive debugging sessions -- Customized task execution - -**Example:** -```python -from openhands.sdk.context.microagents import TaskMicroagent, InputMetadata - -microagent = TaskMicroagent( - name="deploy", - content="Deploy to {environment} with {version}...", - triggers=["/deploy"], - inputs=[ - InputMetadata(name="environment", type="string", required=True), - InputMetadata(name="version", type="string", required=True) - ] -) - -# User types: "/deploy" -# Agent prompts: "Enter environment:" "Enter version:" -# Agent proceeds with filled template -``` - -## How Microagents Work - -```mermaid -sequenceDiagram - participant User - participant Agent - participant Microagent - participant LLM - - User->>Agent: "Run the tests" - Agent->>Agent: Detect keyword "tests" - Agent->>Microagent: Fetch testing microagent - Microagent->>Agent: Return testing guidelines - Agent->>LLM: Context + guidelines - LLM->>Agent: Response with testing knowledge - Agent->>User: Execute tests with guidelines -``` - -## Using Microagents - -### Basic Usage - -```python -from openhands.sdk import Agent, AgentContext - -# Create context with microagents -context = AgentContext( - microagents=["testing_expert", "code_reviewer"] -) - -# Create agent with microagents -agent = Agent( - llm=llm, - tools=tools, - agent_context=context -) -``` - -### Example Integration - -See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py) for a complete example. - -## Microagent Structure - -**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) - -A microagent consists of: -- **Name**: Unique identifier -- **Triggers**: Keywords that activate the microagent -- **Content**: Knowledge or instructions to inject -- **Type**: One of "knowledge", "repo", or "task" - -## Response Models - -**Source**: [`openhands/sdk/context/microagents/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/types.py) - -### MicroagentKnowledge - -```python -class MicroagentKnowledge(BaseModel): - name: str # Microagent name - trigger: str # Keyword that triggered it - content: str # Injected content -``` - -### MicroagentResponse - -```python -class MicroagentResponse(BaseModel): - name: str # Microagent name - path: str # Path or identifier - created_at: datetime # Creation timestamp -``` - -### MicroagentContentResponse - -```python -class MicroagentContentResponse(BaseModel): - content: str # Full microagent content - path: str # Path or identifier - triggers: list[str] # Trigger keywords - git_provider: str | None # Git source if applicable -``` - -## Benefits - -1. **Modularity**: Separate specialized knowledge from main agent logic -2. **Reusability**: Share microagents across multiple agents -3. **Maintainability**: Update expertise without modifying agent code -4. **Context-Aware**: Only inject relevant knowledge when needed -5. **Composability**: Combine multiple microagents for comprehensive coverage - -## Best Practices - -1. **Clear Triggers**: Use specific, unambiguous trigger keywords -2. **Focused Content**: Keep microagent content concise and targeted -3. **Avoid Overlap**: Minimize trigger conflicts between microagents -4. **Version Control**: Store microagents in version-controlled repositories -5. **Documentation**: Document trigger keywords and intended use cases - -## See Also - -- **[Official Microagents Guide](https://docs.all-hands.dev/usage/prompting/microagents-overview)** - Comprehensive documentation -- **[Agent Context](/sdk/architecture/sdk/agent.mdx)** - Using context with agents -- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py)** - Working example diff --git a/sdk/arch/sdk/security.mdx b/sdk/arch/sdk/security.mdx deleted file mode 100644 index a41264fc..00000000 --- a/sdk/arch/sdk/security.mdx +++ /dev/null @@ -1,416 +0,0 @@ ---- -title: Security -description: Analyze and control agent actions through security analyzers and confirmation policies. ---- - -The security system enables control over agent actions through risk analysis and confirmation policies. It helps prevent dangerous operations while maintaining agent autonomy for safe actions. - -**Source**: [`openhands/sdk/security/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/security) - -## Core Concepts - -```mermaid -graph TD - Action[Agent Action] --> Analyzer[Security Analyzer] - Analyzer --> Risk[Risk Assessment] - Risk --> Policy[Confirmation Policy] - - Policy --> Low{Risk Level} - Low -->|Low| Execute[Execute] - Low -->|Medium| MaybeConfirm[Policy Decision] - Low -->|High| Confirm[Request Confirmation] - - Confirm --> User[User Decision] - User -->|Approve| Execute - User -->|Reject| Block[Block Action] - - style Action fill:#e1f5fe - style Analyzer fill:#fff3e0 - style Policy fill:#e8f5e8 - style Execute fill:#c8e6c9 - style Block fill:#ffcdd2 -``` - -The security system consists of two components: -- **Security Analyzer**: Assesses risk level of actions -- **Confirmation Policy**: Decides when to require user confirmation - -## Security Analyzer - -### LLM Security Analyzer - -**Source**: [`openhands/sdk/security/llm_analyzer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/llm_analyzer.py) - -Uses an LLM to analyze action safety: - -```python -from openhands.sdk.security import LLMSecurityAnalyzer -from openhands.sdk import Agent, LLM -from pydantic import SecretStr - -# Create security analyzer -security_analyzer = LLMSecurityAnalyzer( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ) -) - -# Create agent with security analyzer -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer -) -``` - -### Risk Levels - -**Source**: [`openhands/sdk/security/risk.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/risk.py) - -```python -from openhands.sdk.security.risk import SecurityRisk - -# Risk levels -SecurityRisk.LOW # Safe operations (read files, list directories) -SecurityRisk.MEDIUM # Potentially impactful (write files, API calls) -SecurityRisk.HIGH # Dangerous operations (delete files, system changes) -``` - -### How LLM Analyzer Works - -1. **Action Inspection**: Examines the action and its parameters -2. **Context Analysis**: Considers conversation history and workspace -3. **Risk Assessment**: LLM predicts risk level with reasoning -4. **Risk Return**: Returns risk level and explanation - -```python -# Example internal flow -action = BashAction(command="rm -rf /") -risk = security_analyzer.analyze(action, context) -# Returns: SecurityRisk.HIGH, "Attempting to delete entire filesystem" -``` - -### Custom Security Analyzer - -Implement custom risk analysis: - -```python -from openhands.sdk.security.analyzer import SecurityAnalyzerBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.tool import Action - -class PatternBasedAnalyzer(SecurityAnalyzerBase): - dangerous_patterns = ["rm -rf", "sudo", "DROP TABLE"] - - def analyze( - self, - action: Action, - context: dict - ) -> tuple[SecurityRisk, str]: - command = getattr(action, "command", "") - - for pattern in self.dangerous_patterns: - if pattern in command: - return ( - SecurityRisk.HIGH, - f"Dangerous pattern detected: {pattern}" - ) - - return SecurityRisk.LOW, "No dangerous patterns found" -``` - -## Confirmation Policies - -**Source**: [`openhands/sdk/security/confirmation_policy.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/confirmation_policy.py) - -### Built-in Policies - -#### NeverConfirm - -Never request confirmation (default): - -```python -from openhands.sdk.security import NeverConfirm - -agent = Agent( - llm=llm, - tools=tools, - confirmation_policy=NeverConfirm() -) -``` - -#### AlwaysConfirm - -Always request confirmation: - -```python -from openhands.sdk.security import AlwaysConfirm - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=AlwaysConfirm() -) -``` - -#### ConfirmOnHighRisk - -Confirm only high-risk actions: - -```python -from openhands.sdk.security import ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnHighRisk() -) -``` - -#### ConfirmOnMediumOrHighRisk - -Confirm medium and high-risk actions: - -```python -from openhands.sdk.security import ConfirmOnMediumOrHighRisk - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnMediumOrHighRisk() -) -``` - -### Custom Confirmation Policy - -Implement custom confirmation logic: - -```python -from openhands.sdk.security.confirmation_policy import ConfirmationPolicyBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.tool import Action - -class TimeBasedPolicy(ConfirmationPolicyBase): - """Require confirmation during business hours.""" - - def should_confirm( - self, - action: Action, - risk: SecurityRisk, - context: dict - ) -> bool: - from datetime import datetime - - hour = datetime.now().hour - - # Business hours: always confirm high risk - if 9 <= hour <= 17: - return risk >= SecurityRisk.HIGH - - # Off hours: confirm medium and high risk - return risk >= SecurityRisk.MEDIUM -``` - -## Using Security System - -### Basic Setup - -```python -from openhands.sdk import Agent, LLM, Conversation -from openhands.sdk.security import ( - LLMSecurityAnalyzer, - ConfirmOnHighRisk -) -from pydantic import SecretStr - -# Create analyzer -security_analyzer = LLMSecurityAnalyzer( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ) -) - -# Create agent with security -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnHighRisk() -) - -# Use in conversation -conversation = Conversation(agent=agent) -``` - -See [`examples/01_standalone_sdk/04_human_in_the_loop.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py). - -### Handling Confirmations - -```python -from openhands.sdk import Conversation -from openhands.sdk.conversation.state import AgentExecutionStatus - -conversation = Conversation(agent=agent) -conversation.send_message("Delete all temporary files") - -# Run agent -conversation.run() - -# Check if waiting for confirmation -if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: - print("Action requires confirmation:") - # Show pending action details - - # User approves - conversation.confirm_pending_action() - conversation.run() - - # Or user rejects - # conversation.reject_pending_action(reason="Too risky") -``` - -### Dynamic Policy Changes - -Change confirmation policy during execution: - -```python -from openhands.sdk.security import AlwaysConfirm, NeverConfirm - -conversation = Conversation(agent=agent) - -# Start with strict policy -conversation.set_confirmation_policy(AlwaysConfirm()) -conversation.send_message("Sensitive task") -conversation.run() - -# Switch to permissive policy -conversation.set_confirmation_policy(NeverConfirm()) -conversation.send_message("Safe task") -conversation.run() -``` - -## Security Workflow - -```mermaid -sequenceDiagram - participant Agent - participant Analyzer - participant Policy - participant User - participant Tool - - Agent->>Analyzer: analyze(action) - Analyzer->>Analyzer: Assess risk - Analyzer->>Agent: risk + explanation - - Agent->>Policy: should_confirm(action, risk) - Policy->>Policy: Apply policy rules - - alt No confirmation needed - Policy->>Agent: execute - Agent->>Tool: Execute action - Tool->>Agent: Observation - else Confirmation required - Policy->>User: Request approval - User->>Policy: Approve/Reject - alt Approved - Policy->>Agent: execute - Agent->>Tool: Execute action - else Rejected - Policy->>Agent: block - Agent->>Agent: UserRejectObservation - end - end -``` - -## Best Practices - -1. **Use LLM Analyzer**: Provides nuanced risk assessment -2. **Start Conservative**: Begin with strict policies, relax as needed -3. **Monitor Blocked Actions**: Review what's being blocked -4. **Provide Context**: Better context enables better risk assessment -5. **Test Security Setup**: Verify policies work as expected -6. **Document Policies**: Explain confirmation requirements to users -7. **Handle Rejections**: Implement proper error handling for rejected actions - -## Performance Considerations - -### LLM Analyzer Overhead - -LLM security analysis adds latency: -- **Cost**: Additional LLM call per action -- **Latency**: ~1-2 seconds per analysis -- **Tokens**: ~500-1000 tokens per analysis - -```python -# Only use with confirmation policy -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, # Costs tokens - confirmation_policy=ConfirmOnHighRisk() # Must be used together -) -``` - -### Optimization Tips - -1. **Cache Similar Actions**: Reuse assessments for similar actions -2. **Use Faster Models**: Consider faster LLMs for security analysis -3. **Pattern-Based Pre-Filter**: Use pattern matching before LLM analysis -4. **Batch Analysis**: Analyze multiple actions together when possible - -## Security Best Practices - -### Principle of Least Privilege - -```python -# Provide only necessary tools -agent = Agent( - llm=llm, - tools=[ - FileEditorTool.create(), # Safe file operations - # Don't include BashTool for untrusted tasks - ] -) -``` - -### Sandbox Execution - -```python -# Use DockerWorkspace for isolation -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -conversation = Conversation(agent=agent, workspace=workspace) -``` - -### Secrets Management - -```python -# Provide secrets securely -conversation = Conversation( - agent=agent, - secrets={ - "API_KEY": "secret-value", - "PASSWORD": "secure-password" - } -) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration with security -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Handling confirmations -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool security considerations -- **[Human-in-the-Loop Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py)** - Complete example diff --git a/sdk/arch/sdk/tool.mdx b/sdk/arch/sdk/tool.mdx deleted file mode 100644 index 3bbe737e..00000000 --- a/sdk/arch/sdk/tool.mdx +++ /dev/null @@ -1,199 +0,0 @@ ---- -title: Tool System -description: Define custom tools for agents to interact with external systems through typed action/observation patterns. ---- - -The tool system enables agents to interact with external systems and perform actions. Tools follow a typed action/observation pattern with comprehensive validation and schema generation. - -## Core Concepts - -```mermaid -graph LR - Action[Action] --> Tool[Tool] - Tool --> Executor[ToolExecutor] - Executor --> Observation[Observation] - - style Action fill:#e1f5fe - style Tool fill:#f3e5f5 - style Executor fill:#fff3e0 - style Observation fill:#e8f5e8 -``` - -A tool consists of three components: -- **Action**: Input schema defining tool parameters -- **ToolExecutor**: Logic that executes the tool -- **Observation**: Output schema with execution results - -**Source**: [`openhands/sdk/tool/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool) - -## Defining Custom Tools - -### 1. Define Action and Observation - -**Source**: [`openhands/sdk/tool/schema.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/schema.py) - -```python -from openhands.sdk.tool import Action, Observation - -class CalculateAction(Action): - """Action to perform calculation.""" - expression: str - precision: int = 2 - -class CalculateObservation(Observation): - """Result of calculation.""" - result: float - success: bool -``` - -### 2. Implement ToolExecutor - -**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) - -```python -from openhands.sdk.tool import ToolExecutor - -class CalculateExecutor(ToolExecutor[CalculateAction, CalculateObservation]): - def __call__(self, action: CalculateAction) -> CalculateObservation: - try: - result = eval(action.expression) - return CalculateObservation( - result=round(result, action.precision), - success=True - ) - except Exception as e: - return CalculateObservation( - result=0.0, - success=False, - error=str(e) - ) -``` - -### 3. Create Tool Class - -```python -from openhands.sdk.tool import Tool - -class CalculateTool(Tool[CalculateAction, CalculateObservation]): - name: str = "calculate" - description: str = "Evaluate mathematical expressions" - action_type: type[Action] = CalculateAction - observation_type: type[Observation] = CalculateObservation - - @classmethod - def create(cls) -> list["CalculateTool"]: - executor = CalculateExecutor() - return [cls().set_executor(executor)] -``` - -### Complete Example - -See [`examples/01_standalone_sdk/02_custom_tools.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) for a working example. - -## Built-in Tools - -**Source**: [`openhands/sdk/tool/builtins/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool/builtins) - -### FinishTool - -**Source**: [`openhands/sdk/tool/builtins/finish.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/finish.py) - -Signals task completion with optional output. - -```python -from openhands.sdk.tool.builtins import FinishTool - -# Automatically included with agents -finish_tool = FinishTool.create() -``` - -### ThinkTool - -**Source**: [`openhands/sdk/tool/builtins/think.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/think.py) - -Enables internal reasoning without external actions. - -```python -from openhands.sdk.tool.builtins import ThinkTool - -# Automatically included with agents -think_tool = ThinkTool.create() -``` - -## Tool Annotations - -**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) - -Provide hints about tool behavior following [MCP spec](https://modelcontextprotocol.io/): - -```python -from openhands.sdk.tool import ToolAnnotations - -annotations = ToolAnnotations( - title="Calculate", - readOnlyHint=True, # Tool doesn't modify environment - destructiveHint=False, # Tool doesn't perform destructive updates - idempotentHint=True, # Same input produces same output - openWorldHint=False # Tool doesn't interact with external entities -) - -class CalculateTool(Tool[CalculateAction, CalculateObservation]): - annotations: ToolAnnotations = annotations - # ... rest of tool definition -``` - -## Tool Registry - -**Source**: [`openhands/sdk/tool/registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/registry.py) - -Tools are automatically registered when defined. The registry manages tool discovery and schema generation for LLM function calling. - -## Advanced Patterns - -### Stateful Executors - -Executors can maintain state across executions: - -```python -class DatabaseExecutor(ToolExecutor[QueryAction, QueryObservation]): - def __init__(self, connection_string: str): - self.connection = connect(connection_string) - - def __call__(self, action: QueryAction) -> QueryObservation: - result = self.connection.execute(action.query) - return QueryObservation(rows=result.fetchall()) - - def close(self) -> None: - """Clean up resources.""" - self.connection.close() -``` - -### Dynamic Tool Creation - -Create tools with runtime configuration: - -```python -class ConfigurableTool(Tool[MyAction, MyObservation]): - @classmethod - def create(cls, api_key: str, endpoint: str) -> list["ConfigurableTool"]: - executor = MyExecutor(api_key=api_key, endpoint=endpoint) - return [cls().set_executor(executor)] - -# Use with different configurations -tool1 = ConfigurableTool.create(api_key="key1", endpoint="https://api1.com") -tool2 = ConfigurableTool.create(api_key="key2", endpoint="https://api2.com") -``` - -## Best Practices - -1. **Type Safety**: Use Pydantic models for actions and observations -2. **Error Handling**: Always handle exceptions in executors -3. **Resource Management**: Implement `close()` for cleanup -4. **Clear Descriptions**: Provide detailed docstrings for LLM understanding -5. **Validation**: Leverage Pydantic validators for input validation - -## See Also - -- **[Pre-defined Tools](/sdk/architecture/tools/)** - Ready-to-use tool implementations -- **[MCP Integration](/sdk/architecture/sdk/mcp.mdx)** - Connect to external MCP tools -- **[Agent Usage](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents diff --git a/sdk/arch/sdk/workspace.mdx b/sdk/arch/sdk/workspace.mdx deleted file mode 100644 index 42d61900..00000000 --- a/sdk/arch/sdk/workspace.mdx +++ /dev/null @@ -1,322 +0,0 @@ ---- -title: Workspace Interface -description: Abstract interface for agent execution environments supporting local and remote operations. ---- - -The workspace interface defines how agents interact with their execution environment. It provides a unified API for file operations and command execution, supporting both local and remote environments. - -**Source**: [`openhands/sdk/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace) - -## Core Concepts - -```mermaid -graph TD - BaseWorkspace[BaseWorkspace] --> Local[LocalWorkspace] - BaseWorkspace --> Remote[RemoteWorkspace] - - Local --> FileOps[File Operations] - Local --> CmdExec[Command Execution] - - Remote --> Docker[DockerWorkspace] - Remote --> API[RemoteAPIWorkspace] - - style BaseWorkspace fill:#e1f5fe - style Local fill:#e8f5e8 - style Remote fill:#fff3e0 -``` - -A workspace provides: -- **File Operations**: Upload, download, read, write -- **Command Execution**: Run bash commands with timeout support -- **Resource Management**: Context manager protocol for cleanup -- **Flexibility**: Local development or remote sandboxed execution - -## Base Interface - -**Source**: [`openhands/sdk/workspace/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/base.py) - -### BaseWorkspace - -Abstract base class defining the workspace interface: - -```python -from openhands.sdk.workspace import BaseWorkspace - -class CustomWorkspace(BaseWorkspace): - working_dir: str # Required: working directory path - - def execute_command( - self, - command: str, - cwd: str | None = None, - timeout: float = 30.0 - ) -> CommandResult: - """Execute bash command.""" - ... - - def file_upload( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - """Upload file to workspace.""" - ... - - def file_download( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - """Download file from workspace.""" - ... -``` - -### Context Manager Protocol - -All workspaces support the context manager protocol for safe resource management: - -```python -with workspace: - result = workspace.execute_command("echo 'hello'") - # Workspace automatically cleans up on exit -``` - -## LocalWorkspace - -**Source**: [`openhands/sdk/workspace/local.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/local.py) - -Executes operations directly on the local machine. - -```python -from openhands.sdk.workspace import LocalWorkspace - -workspace = LocalWorkspace(working_dir="/path/to/project") - -# Execute command -result = workspace.execute_command("ls -la") -print(result.stdout) - -# Upload file (copy) -workspace.file_upload("local_file.txt", "workspace_file.txt") - -# Download file (copy) -workspace.file_download("workspace_file.txt", "local_copy.txt") -``` - -**Use Cases**: -- Local development and testing -- Direct file system access -- No sandboxing required -- Fast execution without network overhead - -## RemoteWorkspace - -**Source**: [`openhands/sdk/workspace/remote/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace/remote) - -Abstract base for remote execution environments. - -### RemoteWorkspace Mixin - -**Source**: [`openhands/sdk/workspace/remote/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/base.py) - -Provides common functionality for remote workspaces: -- Network communication -- File transfer protocols -- Command execution over API -- Resource cleanup - -### AsyncRemoteWorkspace - -**Source**: [`openhands/sdk/workspace/remote/async_remote_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/async_remote_workspace.py) - -Async version for concurrent operations. - -## Concrete Remote Implementations - -Remote workspace implementations are provided in the `workspace` package: - -### DockerWorkspace - -**Source**: See [workspace/docker documentation](/sdk/architecture/workspace/docker.mdx) - -Executes operations in an isolated Docker container. - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04", - container_name="agent-sandbox" -) - -with workspace: - result = workspace.execute_command("python script.py") -``` - -**Benefits**: -- Strong isolation and sandboxing -- Reproducible environments -- Resource limits and security -- Clean slate for each session - -### RemoteAPIWorkspace - -**Source**: See [workspace/remote_api documentation](/sdk/architecture/workspace/remote_api.mdx) - -Connects to a remote agent server via API. - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("npm test") -``` - -**Benefits**: -- Centralized agent execution -- Shared resources and caching -- Scalable architecture -- Remote monitoring and logging - -## Result Models - -**Source**: [`openhands/sdk/workspace/models.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/models.py) - -### CommandResult - -```python -class CommandResult(BaseModel): - stdout: str # Standard output - stderr: str # Standard error - exit_code: int # Exit code (0 = success) - duration: float # Execution time in seconds -``` - -### FileOperationResult - -```python -class FileOperationResult(BaseModel): - success: bool # Operation success status - message: str # Status message - path: str # File path -``` - -## Usage with Agents - -Workspaces integrate with agents through tools: - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from openhands.sdk.workspace import LocalWorkspace - -# Create workspace -workspace = LocalWorkspace(working_dir="/project") - -# Create tools with workspace -tools = [ - BashTool.create(working_dir=workspace.working_dir), - FileEditorTool.create() -] - -# Create agent -agent = Agent(llm=llm, tools=tools) -``` - -## Local vs Remote Comparison - -| Feature | LocalWorkspace | RemoteWorkspace | -|---------|---------------|-----------------| -| **Execution** | Local machine | Remote server/container | -| **Isolation** | None | Strong (Docker/API) | -| **Performance** | Fast | Network latency | -| **Security** | Host system | Sandboxed environment | -| **Setup** | Simple | Requires infrastructure | -| **Use Case** | Development | Production/Multi-user | - -## Advanced Usage - -### Custom Workspace Implementation - -```python -from openhands.sdk.workspace import BaseWorkspace -from openhands.sdk.workspace.models import CommandResult, FileOperationResult - -class CloudWorkspace(BaseWorkspace): - working_dir: str - cloud_instance_id: str - - def execute_command( - self, - command: str, - cwd: str | None = None, - timeout: float = 30.0 - ) -> CommandResult: - # Execute on cloud instance - response = self.cloud_api.run_command( - instance_id=self.cloud_instance_id, - command=command - ) - return CommandResult( - stdout=response.stdout, - stderr=response.stderr, - exit_code=response.exit_code, - duration=response.duration - ) - - def file_upload( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - # Upload to cloud storage - ... - - def file_download( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - # Download from cloud storage - ... -``` - -### Error Handling - -```python -from openhands.sdk.workspace import LocalWorkspace - -workspace = LocalWorkspace(working_dir="/project") - -try: - result = workspace.execute_command("risky_command", timeout=60.0) - if result.exit_code != 0: - print(f"Command failed: {result.stderr}") -except TimeoutError: - print("Command timed out") -except Exception as e: - print(f"Execution error: {e}") -``` - -## Best Practices - -1. **Use Context Managers**: Always use `with` statements for proper cleanup -2. **Set Appropriate Timeouts**: Prevent hanging on long-running commands -3. **Validate Working Directory**: Ensure paths exist before operations -4. **Handle Errors**: Check exit codes and handle exceptions -5. **Choose Right Workspace**: Local for development, remote for production -6. **Resource Limits**: Set appropriate resource limits for remote workspaces - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based sandboxing -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Remote agent execution server -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace usage examples diff --git a/sdk/arch/tools/bash.mdx b/sdk/arch/tools/bash.mdx deleted file mode 100644 index 3497307c..00000000 --- a/sdk/arch/tools/bash.mdx +++ /dev/null @@ -1,288 +0,0 @@ ---- -title: BashTool -description: Execute bash commands with persistent session support, timeout control, and environment management. ---- - -BashTool enables agents to execute bash commands in a persistent session with full control over working directory, environment variables, and execution timeout. - -**Source**: [`openhands/tools/execute_bash/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash) - -## Overview - -BashTool provides: -- Persistent bash session across multiple commands -- Environment variable management -- Timeout control for long-running commands -- Working directory configuration -- Support for both local and remote execution - -## Usage - -### Basic Usage - -```python -from openhands.tools import BashTool - -# Create tool -bash_tool = BashTool.create() - -# Use with agent -from openhands.sdk import Agent - -agent = Agent( - llm=llm, - tools=[bash_tool] -) -``` - -### With Configuration - -```python -bash_tool = BashTool.create( - working_dir="/project/path", - timeout=60.0 # 60 seconds -) -``` - -## Action Model - -**Source**: [`openhands/tools/execute_bash/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/definition.py) - -```python -class BashAction(Action): - command: str # Bash command to execute - thought: str = "" # Optional reasoning -``` - -### Example - -```python -from openhands.tools import BashAction - -action = BashAction( - command="ls -la", - thought="List files to understand directory structure" -) -``` - -## Observation Model - -```python -class BashObservation(Observation): - output: str # Command output (stdout + stderr) - exit_code: int # Exit code (0 = success) -``` - -### Example - -```python -# Successful execution -observation = BashObservation( - output="file1.txt\nfile2.py\n", - exit_code=0 -) - -# Failed execution -observation = BashObservation( - output="command not found: invalid_cmd\n", - exit_code=127 -) -``` - -## Features - -### Persistent Session - -Commands execute in the same bash session, preserving: -- Environment variables -- Working directory changes -- Shell state - -```python -# Set environment variable -agent.run("export API_KEY=secret") - -# Use in next command -agent.run("echo $API_KEY") # Outputs: secret -``` - -### Terminal Types - -**Source**: [`openhands/tools/execute_bash/terminal/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash/terminal) - -BashTool supports multiple terminal implementations: - -- **SubprocessTerminal**: Direct subprocess execution (default) -- **TmuxTerminal**: Tmux-based persistent sessions - -### Timeout Control - -Commands automatically timeout after the specified duration: - -```python -bash_tool = BashTool.create(timeout=30.0) # 30 second timeout - -# Long-running command will be terminated -action = BashAction(command="sleep 60") # Timeout after 30s -``` - -### Environment Management - -Set custom environment variables: - -```python -# Via workspace secrets -from openhands.sdk import Conversation - -conversation = Conversation( - agent=agent, - secrets={ - "DATABASE_URL": "postgres://...", - "API_KEY": "secret" - } -) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -## Common Use Cases - -### File Operations - -```python -# Create directory -BashAction(command="mkdir -p /path/to/dir") - -# Copy files -BashAction(command="cp source.txt dest.txt") - -# Find files -BashAction(command="find . -name '*.py'") -``` - -### Build and Test - -```python -# Install dependencies -BashAction(command="pip install -r requirements.txt") - -# Run tests -BashAction(command="pytest tests/") - -# Build project -BashAction(command="npm run build") -``` - -### Git Operations - -```python -# Clone repository -BashAction(command="git clone https://github.com/user/repo.git") - -# Create branch -BashAction(command="git checkout -b feature-branch") - -# Commit changes -BashAction(command='git commit -m "Add feature"') -``` - -### System Information - -```python -# Check disk space -BashAction(command="df -h") - -# List processes -BashAction(command="ps aux") - -# Network information -BashAction(command="ifconfig") -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: Prevent hanging on long commands -2. **Use Absolute Paths**: Or configure working directory explicitly -3. **Check Exit Codes**: Verify command success in agent logic -4. **Escape Special Characters**: Properly quote arguments -5. **Avoid Interactive Commands**: BashTool works best with non-interactive commands -6. **Use Security Analysis**: Enable for sensitive operations - -## Security Considerations - -### Risk Assessment - -BashTool actions have varying risk levels: - -- **LOW**: Read operations (`ls`, `cat`, `grep`) -- **MEDIUM**: Write operations (`touch`, `mkdir`, `echo >`) -- **HIGH**: Destructive operations (`rm -rf`, `sudo`, `chmod`) - -### Enable Security - -```python -from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=[BashTool.create()], - security_analyzer=LLMSecurityAnalyzer(llm=llm), - confirmation_policy=ConfirmOnHighRisk() -) -``` - -### Sandboxing - -Use DockerWorkspace for isolation: - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -conversation = Conversation(agent=agent, workspace=workspace) -``` - -## Error Handling - -### Common Exit Codes - -- `0`: Success -- `1`: General error -- `2`: Misuse of shell builtin -- `126`: Command not executable -- `127`: Command not found -- `130`: Terminated by Ctrl+C -- `137`: Killed by SIGKILL (timeout) - -### Handling Failures - -```python -# Agent can check observation -if observation.exit_code != 0: - # Handle error based on output - if "permission denied" in observation.output.lower(): - # Retry with different approach - pass -``` - -## Implementation Details - -**Source**: [`openhands/tools/execute_bash/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/impl.py) - -The tool uses a terminal interface that: -1. Initializes a persistent bash session -2. Executes commands with timeout support -3. Captures stdout and stderr -4. Returns exit codes -5. Handles session cleanup - -## See Also - -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For file manipulation -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/browser_use.mdx b/sdk/arch/tools/browser_use.mdx deleted file mode 100644 index bd52db73..00000000 --- a/sdk/arch/tools/browser_use.mdx +++ /dev/null @@ -1,101 +0,0 @@ ---- -title: BrowserUseTool -description: Web browsing and interaction capabilities powered by browser-use integration. ---- - -BrowserUseTool enables agents to interact with web pages, navigate websites, and extract web content through an integrated browser. - -**Source**: [`openhands/tools/browser_use/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/browser_use) - -## Overview - -BrowserUseTool provides: -- Web page navigation -- Element interaction (click, type, etc.) -- Content extraction -- Screenshot capture -- JavaScript execution - -## Usage - -```python -from openhands.tools import BrowserUseTool - -agent = Agent(llm=llm, tools=[BrowserUseTool.create()]) -``` - -## Features - -### Web Navigation - -- Navigate to URLs -- Follow links -- Browser back/forward -- Page refresh - -### Element Interaction - -- Click elements -- Fill forms -- Submit data -- Select dropdowns - -### Content Extraction - -- Extract text content -- Get element attributes -- Capture screenshots -- Parse structured data - -## Use Cases - -### Web Scraping - -```python -# Navigate to page and extract data -# Agent can use browser to: -# 1. Navigate to target URL -# 2. Wait for content to load -# 3. Extract desired information -# 4. Return structured data -``` - -### Web Testing - -```python -# Test web applications -# Agent can: -# 1. Navigate to application -# 2. Fill out forms -# 3. Click buttons -# 4. Verify expected behavior -``` - -### Research - -```python -# Research information online -# Agent can: -# 1. Search for information -# 2. Navigate search results -# 3. Extract relevant content -# 4. Synthesize findings -``` - -## Integration - -BrowserUseTool is powered by the [browser-use](https://github.com/browser-use/browser-use) library, providing robust web automation capabilities. - -## Best Practices - -1. **Handle Loading**: Wait for page content to load -2. **Error Handling**: Handle navigation and interaction failures -3. **Rate Limiting**: Be respectful of target websites -4. **Security**: Avoid sensitive operations in browser -5. **Timeouts**: Set appropriate timeouts for operations - -## See Also - -- **[browser-use](https://github.com/browser-use/browser-use)** - Underlying browser automation library -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For local command execution -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For processing extracted content diff --git a/sdk/arch/tools/file_editor.mdx b/sdk/arch/tools/file_editor.mdx deleted file mode 100644 index fff65d25..00000000 --- a/sdk/arch/tools/file_editor.mdx +++ /dev/null @@ -1,338 +0,0 @@ ---- -title: FileEditorTool -description: Edit files with diff-based operations, undo support, and intelligent line-based modifications. ---- - -FileEditorTool provides powerful file editing capabilities with diff-based operations, undo/redo support, and intelligent line-based modifications. It's designed for precise code and text file manipulation. - -**Source**: [`openhands/tools/file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/file_editor) - -## Overview - -FileEditorTool provides: -- View file contents with line numbers -- Insert, delete, and replace lines -- String-based find-and-replace -- Undo/redo support -- Automatic diff generation -- File history tracking - -## Usage - -```python -from openhands.tools import FileEditorTool - -# Create tool -file_editor = FileEditorTool.create() - -# Use with agent -agent = Agent(llm=llm, tools=[file_editor]) -``` - -## Available Commands - -**Source**: [`openhands/tools/file_editor/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/definition.py) - -### view -View file contents with line numbers. - -```python -FileEditAction( - command="view", - path="script.py" -) -``` - -Optional parameters: -- `view_range=[start, end]`: View specific line range - -### create -Create a new file with content. - -```python -FileEditAction( - command="create", - path="newfile.py", - file_text="print('Hello, World!')\n" -) -``` - -### str_replace -Replace a string in the file. - -```python -FileEditAction( - command="str_replace", - path="script.py", - old_str="old_function()", - new_str="new_function()" -) -``` - -### insert -Insert text after a specific line. - -```python -FileEditAction( - command="insert", - path="script.py", - insert_line=10, - new_str=" # New code here\n" -) -``` - -### undo_edit -Undo the last edit operation. - -```python -FileEditAction( - command="undo_edit", - path="script.py" -) -``` - -## Action Model - -```python -class FileEditAction(Action): - command: Literal["view", "create", "str_replace", "insert", "undo_edit"] - path: str # File path - file_text: str | None = None # For create - old_str: str | None = None # For str_replace - new_str: str | None = None # For str_replace/insert - insert_line: int | None = None # For insert - view_range: list[int] | None = None # For view -``` - -## Observation Model - -```python -class FileEditObservation(Observation): - content: str # Result message or file content - success: bool # Operation success status - diff: str | None = None # Unified diff for changes -``` - -## Features - -### Diff Generation - -Automatic diff generation for all modifications: - -```python -# After edit -observation = FileEditObservation( - content="File edited successfully", - success=True, - diff=""" ---- script.py -+++ script.py -@@ -1,3 +1,3 @@ - def main(): -- print("old") -+ print("new") -""" -) -``` - -### Edit History - -Track file modification history with undo support: - -```python -# Edit file -action1 = FileEditAction(command="str_replace", path="file.py", ...) - -# Make another edit -action2 = FileEditAction(command="insert", path="file.py", ...) - -# Undo last edit -action3 = FileEditAction(command="undo_edit", path="file.py") -``` - -### Line-Based Operations - -All operations work with line numbers for precision: - -```python -# View specific lines -FileEditAction( - command="view", - path="large_file.py", - view_range=[100, 150] # View lines 100-150 -) - -# Insert at specific line -FileEditAction( - command="insert", - path="script.py", - insert_line=25, - new_str=" new_code()\n" -) -``` - -### String Replacement - -Find and replace with exact matching: - -```python -# Must match exactly including whitespace -FileEditAction( - command="str_replace", - path="config.py", - old_str="DEBUG = False\nLOG_LEVEL = 'INFO'", - new_str="DEBUG = True\nLOG_LEVEL = 'DEBUG'" -) -``` - -## Common Use Cases - -### Creating Files - -```python -# Create Python script -FileEditAction( - command="create", - path="hello.py", - file_text="#!/usr/bin/env python3\nprint('Hello, World!')\n" -) - -# Create configuration file -FileEditAction( - command="create", - path="config.json", - file_text='{"setting": "value"}\n' -) -``` - -### Viewing Files - -```python -# View entire file -FileEditAction(command="view", path="README.md") - -# View specific section -FileEditAction( - command="view", - path="large_file.py", - view_range=[1, 50] -) - -# View end of file -FileEditAction( - command="view", - path="log.txt", - view_range=[-20, -1] # Last 20 lines -) -``` - -### Refactoring Code - -```python -# Rename function -FileEditAction( - command="str_replace", - path="module.py", - old_str="def old_name(arg):", - new_str="def new_name(arg):" -) - -# Add import -FileEditAction( - command="insert", - path="script.py", - insert_line=0, - new_str="import numpy as np\n" -) - -# Fix bug -FileEditAction( - command="str_replace", - path="buggy.py", - old_str=" if x = 5:", - new_str=" if x == 5:" -) -``` - -## Best Practices - -1. **View Before Editing**: Always view file content first -2. **Exact String Matching**: Ensure `old_str` matches exactly -3. **Include Context**: Include surrounding lines for uniqueness -4. **Use Line Numbers**: View with line numbers for precise edits -5. **Check Success**: Verify `observation.success` before proceeding -6. **Review Diffs**: Check generated diffs for accuracy -7. **Use Undo Sparingly**: Undo only when necessary - -## Error Handling - -### Common Errors - -```python -# File not found -FileEditObservation( - content="Error: File 'missing.py' not found", - success=False -) - -# String not found -FileEditObservation( - content="Error: old_str not found in file", - success=False -) - -# Multiple matches -FileEditObservation( - content="Error: old_str matched multiple locations", - success=False -) - -# Invalid line number -FileEditObservation( - content="Error: insert_line out of range", - success=False -) -``` - -### Recovery Strategies - -```python -# If string not found, view file first -if not observation.success and "not found" in observation.content: - # View file to understand current content - view_action = FileEditAction(command="view", path=path) -``` - -## Implementation Details - -**Source**: [`openhands/tools/file_editor/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/impl.py) - -The editor maintains: -- **File Cache**: Efficient file content caching -- **Edit History**: Per-file undo stack -- **Diff Engine**: Unified diff generation -- **Encoding Detection**: Automatic encoding handling - -## Configuration - -**Source**: [`openhands/tools/file_editor/utils/config.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/utils/config.py) - -```python -# Constants -MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB -MAX_HISTORY_SIZE = 100 # Max undo operations -``` - -## Security Considerations - -- File operations are restricted to working directory -- No execution of file content -- Safe for user-generated content -- Automatic encoding detection prevents binary file issues - -## See Also - -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For file system operations -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/glob.mdx b/sdk/arch/tools/glob.mdx deleted file mode 100644 index 8983d0af..00000000 --- a/sdk/arch/tools/glob.mdx +++ /dev/null @@ -1,89 +0,0 @@ ---- -title: GlobTool -description: Find files using glob patterns with recursive search and flexible matching. ---- - -GlobTool enables file discovery using glob patterns, supporting recursive search, wildcards, and flexible path matching. - -**Source**: [`openhands/tools/glob/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/glob) - -## Usage - -```python -from openhands.tools import GlobTool - -agent = Agent(llm=llm, tools=[GlobTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/glob/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/glob/definition.py) - -```python -class GlobAction(Action): - pattern: str # Glob pattern (e.g., "**/*.py") -``` - -## Observation Model - -```python -class GlobObservation(Observation): - paths: list[str] # List of matching file paths -``` - -## Pattern Syntax - -- `*`: Match any characters except `/` -- `**`: Match any characters including `/` (recursive) -- `?`: Match single character -- `[abc]`: Match any character in brackets -- `[!abc]`: Match any character not in brackets - -## Examples - -### Find Python Files - -```python -GlobAction(pattern="**/*.py") -# Returns: ["src/main.py", "tests/test_main.py", ...] -``` - -### Find Specific Files - -```python -GlobAction(pattern="**/test_*.py") -# Returns: ["tests/test_api.py", "tests/test_utils.py", ...] -``` - -### Multiple Extensions - -```python -GlobAction(pattern="**/*.{py,js,ts}") -# Returns: ["script.py", "app.js", "types.ts", ...] -``` - -### Current Directory Only - -```python -GlobAction(pattern="*.txt") -# Returns: ["readme.txt", "notes.txt", ...] -``` - -## Common Use Cases - -- **Code Discovery**: `**/*.py` - Find all Python files -- **Test Files**: `**/test_*.py` - Find test files -- **Configuration**: `**/*.{json,yaml,yml}` - Find config files -- **Documentation**: `**/*.md` - Find markdown files - -## Best Practices - -1. **Use Recursive Patterns**: `**/*` for deep searches -2. **Specific Extensions**: Narrow results with extensions -3. **Combine with GrepTool**: Find files, then search content -4. **Check Results**: Handle empty result lists - -## See Also - -- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative file operations diff --git a/sdk/arch/tools/grep.mdx b/sdk/arch/tools/grep.mdx deleted file mode 100644 index bd879318..00000000 --- a/sdk/arch/tools/grep.mdx +++ /dev/null @@ -1,140 +0,0 @@ ---- -title: GrepTool -description: Search file contents using regex patterns with context and match highlighting. ---- - -GrepTool enables content search across files using regex patterns, providing context around matches and detailed results. - -**Source**: [`openhands/tools/grep/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/grep) - -## Usage - -```python -from openhands.tools import GrepTool - -agent = Agent(llm=llm, tools=[GrepTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/grep/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/grep/definition.py) - -```python -class GrepAction(Action): - pattern: str # Regex pattern to search - path: str = "." # Directory or file to search - case_sensitive: bool = False # Case sensitivity -``` - -## Observation Model - -```python -class GrepObservation(Observation): - matches: list[dict] # List of matches with context - # Each match contains: - # - file: str - File path - # - line: int - Line number - # - content: str - Matching line -``` - -## Examples - -### Search for Function Definition - -```python -GrepAction( - pattern=r"def\s+\w+\(", - path="src/", - case_sensitive=False -) -# Returns: [ -# {"file": "src/main.py", "line": 10, "content": "def process_data(x):"}, -# ... -# ] -``` - -### Case-Sensitive Search - -```python -GrepAction( - pattern="TODO", - path=".", - case_sensitive=True -) -# Only matches exact case "TODO" -``` - -### Search Specific File - -```python -GrepAction( - pattern="import.*pandas", - path="script.py" -) -``` - -## Pattern Syntax - -Supports Python regex patterns: -- `.`: Any character -- `*`: Zero or more -- `+`: One or more -- `?`: Optional -- `[]`: Character class -- `()`: Group -- `|`: Alternation -- `^`: Line start -- `$`: Line end - -## Common Use Cases - -### Find TODOs - -```python -GrepAction(pattern=r"TODO|FIXME|XXX", path=".") -``` - -### Find Imports - -```python -GrepAction(pattern=r"^import |^from .* import ", path="src/") -``` - -### Find API Keys (for security review) - -```python -GrepAction(pattern=r"api[_-]key|secret|password", path=".") -``` - -### Find Function Calls - -```python -GrepAction(pattern=r"database\.query\(", path=".") -``` - -## Best Practices - -1. **Escape Special Characters**: Use `\` for regex special chars -2. **Use Anchors**: `^` and `$` for line boundaries -3. **Case Insensitive Default**: Unless exact case matters -4. **Narrow Search Paths**: Search specific directories -5. **Combine with GlobTool**: Find files first, then grep - -## Workflow Pattern - -```python -# 1. Find relevant files -glob_action = GlobAction(pattern="**/*.py") - -# 2. Search content in those files -grep_action = GrepAction( - pattern="class.*Exception", - path="src/" -) -``` - -## See Also - -- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - View/edit files -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative with `grep` command diff --git a/sdk/arch/tools/overview.mdx b/sdk/arch/tools/overview.mdx deleted file mode 100644 index aadf3f01..00000000 --- a/sdk/arch/tools/overview.mdx +++ /dev/null @@ -1,185 +0,0 @@ ---- -title: Tools Overview -description: Pre-built tools for common agent operations including bash execution, file editing, and code search. ---- - -The `openhands.tools` package provides a collection of pre-built, production-ready tools for common agent operations. These tools enable agents to interact with files, execute commands, search code, and manage tasks. - -**Source**: [`openhands/tools/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools) - -## Available Tools - -### Core Tools - -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Execute bash commands with timeout and environment support -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Edit files with diff-based operations and undo support -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning workflows - -### Search Tools - -- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files using glob patterns -- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents with regex support - -### Specialized Tools - -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Track and manage agent tasks -- **[BrowserUseTool](/sdk/architecture/tools/browser_use.mdx)** - Web browsing and interaction - -## Quick Start - -### Using Individual Tools - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[ - BashTool.create(), - FileEditorTool.create() - ] -) -``` - -### Using Tool Presets - -```python -from openhands.tools.preset import get_default_tools, get_planning_tools - -# Default toolset for general tasks -default_tools = get_default_tools() - -# Specialized toolset for planning workflows -planning_tools = get_planning_tools() - -agent = Agent(llm=llm, tools=default_tools) -``` - -## Tool Structure - -All tools follow a consistent structure: - -```mermaid -graph TD - Tool[Tool Definition] --> Action[Action Model] - Tool --> Observation[Observation Model] - Tool --> Executor[Executor Implementation] - - Action --> Params[Input Parameters] - Observation --> Result[Output Data] - Executor --> Execute[execute() method] - - style Tool fill:#e1f5fe - style Action fill:#fff3e0 - style Observation fill:#e8f5e8 - style Executor fill:#f3e5f5 -``` - -### Tool Components - -1. **Action**: Input model defining tool parameters -2. **Observation**: Output model containing execution results -3. **Executor**: Implementation that executes the tool logic - -## Tool Presets - -**Source**: [`openhands/tools/preset/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/preset) - -### Default Preset - -**Source**: [`openhands/tools/preset/default.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/default.py) - -General-purpose toolset for most tasks: - -```python -from openhands.tools.preset import get_default_tools - -tools = get_default_tools() -# Includes: BashTool, FileEditorTool, GlobTool, GrepTool -``` - -### Planning Preset - -**Source**: [`openhands/tools/preset/planning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py) - -Optimized for planning and multi-file workflows: - -```python -from openhands.tools.preset import get_planning_tools - -tools = get_planning_tools() -# Includes: BashTool, PlanningFileEditorTool, GlobTool, GrepTool, TaskTrackerTool -``` - -## Creating Custom Tools - -See the [Tool Definition Guide](/sdk/architecture/sdk/tool.mdx) for creating custom tools. - -## Tool Security - -Tools support security risk assessment: - -```python -from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()], - security_analyzer=LLMSecurityAnalyzer(llm=llm), - confirmation_policy=ConfirmOnHighRisk() -) -``` - -See [Security Documentation](/sdk/architecture/sdk/security.mdx) for more details. - -## Tool Configuration - -### Working Directory - -Most tools operate relative to a working directory: - -```python -from openhands.tools import BashTool - -bash_tool = BashTool.create(working_dir="/project/path") -``` - -### Timeout Settings - -Configure execution timeouts: - -```python -from openhands.tools import BashTool - -bash_tool = BashTool.create(timeout=60.0) # 60 seconds -``` - -## Best Practices - -1. **Use Presets**: Start with tool presets for common workflows -2. **Configure Timeouts**: Set appropriate timeouts for tools -3. **Provide Context**: Use working directories effectively -4. **Enable Security**: Add security analysis for sensitive operations -5. **Filter Tools**: Use `filter_tools_regex` to limit available tools -6. **Test Locally**: Verify tools work in your environment - -## Tool Examples - -Each tool has comprehensive examples: - -- **[Bash Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Command execution -- **[File Editor Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - File manipulation -- **[Planning Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Planning workflows -- **[Task Tracker Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Task management - -## See Also - -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents -- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Complete examples diff --git a/sdk/arch/tools/planning_file_editor.mdx b/sdk/arch/tools/planning_file_editor.mdx deleted file mode 100644 index e176c93b..00000000 --- a/sdk/arch/tools/planning_file_editor.mdx +++ /dev/null @@ -1,128 +0,0 @@ ---- -title: PlanningFileEditorTool -description: Multi-file editing tool optimized for planning workflows with batch operations. ---- - -PlanningFileEditorTool extends FileEditorTool with multi-file editing capabilities optimized for planning agent workflows. - -**Source**: [`openhands/tools/planning_file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/planning_file_editor) - -## Overview - -PlanningFileEditorTool provides: -- All FileEditorTool capabilities -- Optimized for planning workflows -- Batch file operations -- Coordination with TaskTrackerTool - -## Usage - -```python -from openhands.tools import PlanningFileEditorTool - -agent = Agent(llm=llm, tools=[PlanningFileEditorTool.create()]) -``` - -## Relation to FileEditorTool - -PlanningFileEditorTool inherits all FileEditorTool commands: -- `view`: View file contents -- `create`: Create new files -- `str_replace`: Replace strings -- `insert`: Insert lines -- `undo_edit`: Undo changes - -See [FileEditorTool](/sdk/architecture/tools/file_editor.mdx) for detailed command documentation. - -## Planning Workflow Integration - -```mermaid -graph TD - Plan[Create Task Plan] --> TaskTracker[TaskTrackerTool] - TaskTracker --> Edit[Edit Files] - Edit --> PlanningEditor[PlanningFileEditorTool] - PlanningEditor --> UpdateTasks[Update Task Status] - UpdateTasks --> TaskTracker - - style Plan fill:#fff3e0 - style Edit fill:#e1f5fe - style UpdateTasks fill:#e8f5e8 -``` - -## Usage in Planning Workflows - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): - -```python -from openhands.tools.preset import get_planning_tools - -# Get planning toolset (includes PlanningFileEditorTool) -tools = get_planning_tools() - -agent = Agent(llm=llm, tools=tools) -``` - -## Multi-File Workflow Example - -```python -# 1. Plan tasks -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Create config file", status="todo"), - Task(title="Create main script", status="todo"), - Task(title="Create tests", status="todo") - ] -) - -# 2. Create files -PlanningFileEditAction( - command="create", - path="config.yaml", - file_text="settings:\n debug: true\n" -) - -PlanningFileEditAction( - command="create", - path="main.py", - file_text="import yaml\n\nif __name__ == '__main__':\n pass\n" -) - -# 3. Update task status -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Create config file", status="done"), - Task(title="Create main script", status="done"), - Task(title="Create tests", status="in_progress") - ] -) -``` - -## Best Practices - -1. **Use with TaskTrackerTool**: Coordinate file edits with task status -2. **Plan Before Editing**: Create task plan first -3. **Update Progress**: Mark tasks complete after edits -4. **Follow Workflow**: Plan → Edit → Update → Repeat -5. **Use Planning Preset**: Get all planning tools together - -## When to Use - -Use PlanningFileEditorTool when: -- Building complex multi-file projects -- Following structured planning workflows -- Coordinating with task tracking -- Need agent to manage implementation phases - -Use regular FileEditorTool for: -- Simple file editing tasks -- Single-file modifications -- Ad-hoc editing without planning - -## See Also - -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Base file editing capabilities -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Task management -- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Complete planning toolset -- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Full workflow example diff --git a/sdk/arch/tools/task_tracker.mdx b/sdk/arch/tools/task_tracker.mdx deleted file mode 100644 index 73966ef4..00000000 --- a/sdk/arch/tools/task_tracker.mdx +++ /dev/null @@ -1,146 +0,0 @@ ---- -title: TaskTrackerTool -description: Track and manage agent tasks with status updates and structured task lists. ---- - -TaskTrackerTool enables agents to create, update, and manage task lists for complex multi-step workflows. - -**Source**: [`openhands/tools/task_tracker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/task_tracker) - -## Usage - -```python -from openhands.tools import TaskTrackerTool - -agent = Agent(llm=llm, tools=[TaskTrackerTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/task_tracker/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/task_tracker/definition.py) - -```python -class TaskTrackerAction(Action): - command: Literal["view", "plan"] - task_list: list[Task] | None = None # For plan command -``` - -### Task Model - -```python -class Task: - title: str # Task title - status: Literal["todo", "in_progress", "done"] # Task status - notes: str | None = None # Optional notes -``` - -## Observation Model - -```python -class TaskTrackerObservation(Observation): - task_list: list[Task] # Current task list - message: str # Status message -``` - -## Commands - -### view -View current task list. - -```python -TaskTrackerAction(command="view") -``` - -### plan -Create or update task list. - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Setup environment", status="done"), - Task(title="Write code", status="in_progress"), - Task(title="Run tests", status="todo") - ] -) -``` - -## Usage Patterns - -### Initialize Task List - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Analyze requirements", status="todo"), - Task(title="Design solution", status="todo"), - Task(title="Implement features", status="todo"), - Task(title="Write tests", status="todo"), - Task(title="Deploy", status="todo") - ] -) -``` - -### Update Progress - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Analyze requirements", status="done"), - Task(title="Design solution", status="in_progress"), - Task(title="Implement features", status="todo"), - Task(title="Write tests", status="todo"), - Task(title="Deploy", status="todo") - ] -) -``` - -### Check Current Status - -```python -TaskTrackerAction(command="view") -# Returns current task list with status -``` - -## Best Practices - -1. **Plan Early**: Create task list at workflow start -2. **Update Regularly**: Mark tasks as progress happens -3. **Use Notes**: Add details for complex tasks -4. **One Task Active**: Focus on one "in_progress" task -5. **Mark Complete**: Set "done" when finished - -## Task Status Workflow - -```mermaid -graph LR - TODO[todo] -->|Start work| PROGRESS[in_progress] - PROGRESS -->|Complete| DONE[done] - DONE -->|Reopen if needed| TODO - - style TODO fill:#fff3e0 - style PROGRESS fill:#e1f5fe - style DONE fill:#c8e6c9 -``` - -## Example: Planning Agent - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): - -```python -# Planning agent uses TaskTrackerTool for workflow management -from openhands.tools.preset import get_planning_tools - -agent = Agent( - llm=llm, - tools=get_planning_tools() # Includes TaskTrackerTool -) -``` - -## See Also - -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning -- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Planning toolset -- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Complete workflow diff --git a/sdk/arch/workspace/docker.mdx b/sdk/arch/workspace/docker.mdx deleted file mode 100644 index 4c26fd52..00000000 --- a/sdk/arch/workspace/docker.mdx +++ /dev/null @@ -1,330 +0,0 @@ ---- -title: DockerWorkspace -description: Execute agent operations in isolated Docker containers with automatic container lifecycle management. ---- - -DockerWorkspace provides isolated execution environments using Docker containers. It automatically manages container lifecycle, networking, and resource allocation. - -**Source**: [`openhands/workspace/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/docker) - -## Overview - -DockerWorkspace provides: -- Automatic container creation and cleanup -- Network isolation and port management -- Custom or pre-built Docker images -- Environment variable forwarding -- File system mounting -- Resource limits and controls - -## Usage - -### Basic Usage - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - base_image="python:3.12" -) - -with workspace: - result = workspace.execute_command("python --version") - print(result.stdout) # Python 3.12.x -``` - -### With Pre-built Image - -```python -workspace = DockerWorkspace( - working_dir="/workspace", - server_image="ghcr.io/all-hands-ai/agent-server:latest" -) -``` - -## Configuration - -**Source**: [`openhands/workspace/docker/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/docker/workspace.py) - -### Core Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `working_dir` | `str` | `"/workspace"` | Working directory in container | -| `base_image` | `str \| None` | `None` | Base image to build agent server from | -| `server_image` | `str \| None` | `None` | Pre-built agent server image | -| `host_port` | `int \| None` | `None` | Host port to bind (auto-assigned if None) | -| `forward_env` | `list[str]` | `["DEBUG"]` | Environment variables to forward | -| `container_name` | `str \| None` | `None` | Container name (auto-generated if None) | -| `platform` | `str \| None` | `None` | Target platform (e.g., "linux/amd64") | - -### Using Base Image - -Build agent server on top of custom base image: - -```python -workspace = DockerWorkspace( - base_image="ubuntu:22.04", - working_dir="/workspace" -) -``` - -Agent server components are installed on top of the base image. - -### Using Pre-built Server Image - -Use pre-built agent server image: - -```python -workspace = DockerWorkspace( - server_image="ghcr.io/all-hands-ai/agent-server:latest", - working_dir="/workspace" -) -``` - -Faster startup, no build time required. - -## Lifecycle Management - -### Automatic Cleanup - -```python -with DockerWorkspace(base_image="python:3.12") as workspace: - # Container created - workspace.execute_command("pip install requests") - # Commands execute in container -# Container automatically stopped and removed -``` - -### Manual Management - -```python -workspace = DockerWorkspace(base_image="python:3.12") - -# Manually start (happens automatically in context manager) -# Use workspace -result = workspace.execute_command("ls") - -# Manually cleanup -workspace.__exit__(None, None, None) -``` - -## Environment Configuration - -### Forward Environment Variables - -```python -import os - -os.environ["DATABASE_URL"] = "postgres://..." -os.environ["API_KEY"] = "secret" - -workspace = DockerWorkspace( - base_image="python:3.12", - forward_env=["DATABASE_URL", "API_KEY", "DEBUG"] -) - -with workspace: - result = workspace.execute_command("echo $DATABASE_URL") - # Outputs: postgres://... -``` - -### Custom Container Name - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - container_name="my-agent-container" -) -``` - -Useful for debugging and monitoring. - -### Platform Specification - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - platform="linux/amd64" # Force specific platform -) -``` - -Useful for Apple Silicon Macs running amd64 images. - -## Port Management - -DockerWorkspace automatically finds available ports for container communication: - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=None # Auto-assign (default) -) - -# Or specify explicit port -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=8000 # Use specific port -) -``` - -## File Operations - -### File Upload - -```python -workspace.file_upload( - source_path="local_file.txt", - destination_path="/workspace/file.txt" -) -``` - -### File Download - -```python -workspace.file_download( - source_path="/workspace/output.txt", - destination_path="local_output.txt" -) -``` - -## Building Docker Images - -DockerWorkspace can build custom agent server images: - -**Source**: [`openhands/agent_server/docker/build.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/agent_server/docker/build.py) - -```python -from openhands.agent_server.docker.build import ( - BuildOptions, - build -) - -# Build custom image -image_name = build( - BuildOptions( - base_image="ubuntu:22.04", - target="runtime", # or "dev" - platform="linux/amd64", - context_dir="." - ) -) - -# Use built image -workspace = DockerWorkspace(server_image=image_name) -``` - -## Use with Conversation - -```python -from openhands.sdk import Agent, Conversation -from openhands.tools import BashTool, FileEditorTool -from openhands.workspace import DockerWorkspace - -# Create workspace -workspace = DockerWorkspace( - base_image="python:3.12", - working_dir="/workspace" -) - -# Create agent -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Use in conversation -with workspace: - conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Create a Python web scraper") - conversation.run() -``` - -See [`examples/02_remote_agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server) for complete examples. - -## Security Benefits - -### Isolation - -- **Process Isolation**: Container runs separately from host -- **File System Isolation**: Limited access to host file system -- **Network Isolation**: Separate network namespace - -### Resource Limits - -```python -# Resource limits are configurable via Docker -# Set through Docker API or Dockerfile -``` - -### Sandboxing - -DockerWorkspace provides strong sandboxing: -- Agent cannot access host file system -- Agent cannot interfere with host processes -- Agent operates in controlled environment - -## Performance Considerations - -### Container Startup Time - -- **Base Image Build**: 30-60 seconds (first time) -- **Pre-built Image**: 5-10 seconds -- **Subsequent Runs**: Uses cached images - -### Optimization Tips - -1. **Use Pre-built Images**: Faster than building from base image -2. **Cache Base Images**: Docker caches layers -3. **Minimize Image Size**: Smaller images start faster -4. **Reuse Containers**: For multiple operations (advanced) - -## Troubleshooting - -### Container Fails to Start - -```python -# Check Docker is running -docker ps - -# Check logs -docker logs - -# Verify image exists -docker images -``` - -### Port Already in Use - -```python -# Specify different port -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=8001 # Use alternative port -) -``` - -### Permission Issues - -```python -# Ensure Docker has necessary permissions -# On Linux, add user to docker group: -# sudo usermod -aG docker $USER -``` - -## Best Practices - -1. **Use Context Managers**: Always use `with` statement -2. **Pre-build Images**: Build agent server images ahead of time -3. **Set Resource Limits**: Configure appropriate limits -4. **Monitor Containers**: Track resource usage -5. **Clean Up**: Ensure containers are removed after use -6. **Use Specific Tags**: Pin image versions for reproducibility - -## See Also - -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server running in container -- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Docker workspace examples diff --git a/sdk/arch/workspace/overview.mdx b/sdk/arch/workspace/overview.mdx deleted file mode 100644 index 6a539776..00000000 --- a/sdk/arch/workspace/overview.mdx +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: Workspace Package Overview -description: Advanced workspace implementations providing sandboxed and remote execution environments. ---- - -The `openhands.workspace` package provides advanced workspace implementations for production deployments, including Docker-based sandboxing and remote API execution. - -**Source**: [`openhands/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace) - -## Available Workspaces - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker container isolation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote server execution - -## Workspace Hierarchy - -```mermaid -graph TD - Base[BaseWorkspace] --> Local[LocalWorkspace] - Base --> Remote[RemoteWorkspace] - Remote --> Docker[DockerWorkspace] - Remote --> API[RemoteAPIWorkspace] - - style Base fill:#e1f5fe - style Local fill:#e8f5e8 - style Remote fill:#fff3e0 - style Docker fill:#f3e5f5 - style API fill:#f3e5f5 -``` - -- **BaseWorkspace**: Core interface (in SDK) -- **LocalWorkspace**: Direct local execution (in SDK) -- **RemoteWorkspace**: Base for remote implementations -- **DockerWorkspace**: Docker container execution -- **RemoteAPIWorkspace**: API-based remote execution - -## Comparison - -| Feature | LocalWorkspace | DockerWorkspace | RemoteAPIWorkspace | -|---------|---------------|-----------------|-------------------| -| **Isolation** | None | Strong | Strong | -| **Performance** | Fast | Good | Network latency | -| **Setup** | None | Docker required | Server required | -| **Security** | Host system | Sandboxed | Sandboxed | -| **Use Case** | Development | Production/Testing | Distributed systems | - -## Quick Start - -### Docker Workspace - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -with workspace: - result = workspace.execute_command("echo 'Hello from Docker'") - print(result.stdout) -``` - -### Remote API Workspace - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("python script.py") - print(result.stdout) -``` - -## Use Cases - -### Development -Use `LocalWorkspace` for local development and testing. - -### Testing -Use `DockerWorkspace` for isolated test environments. - -### Production -Use `DockerWorkspace` or `RemoteAPIWorkspace` for production deployments. - -### Multi-User Systems -Use `RemoteAPIWorkspace` with centralized agent server. - -## See Also - -- **[SDK Workspace Interface](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker implementation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote API implementation -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server for remote workspaces diff --git a/sdk/arch/workspace/remote_api.mdx b/sdk/arch/workspace/remote_api.mdx deleted file mode 100644 index cb8ca8a4..00000000 --- a/sdk/arch/workspace/remote_api.mdx +++ /dev/null @@ -1,325 +0,0 @@ ---- -title: RemoteAPIWorkspace -description: Connect to centralized agent servers via HTTP API for scalable distributed agent execution. ---- - -RemoteAPIWorkspace enables agent execution on remote servers through HTTP APIs. It's designed for production deployments requiring centralized agent management and multi-user support. - -**Source**: [`openhands/workspace/remote_api/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/remote_api) - -## Overview - -RemoteAPIWorkspace provides: -- HTTP API communication with agent server -- Authentication and authorization -- Centralized resource management -- Multi-user agent execution -- Monitoring and logging - -## Usage - -### Basic Usage - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("python script.py") - print(result.stdout) -``` - -### With Agent - -```python -from openhands.sdk import Agent, Conversation -from openhands.tools import BashTool, FileEditorTool - -# Create workspace -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -# Create agent -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Use in conversation -conversation = Conversation(agent=agent, workspace=workspace) -conversation.send_message("Your task") -conversation.run() -``` - -## Configuration - -**Source**: [`openhands/workspace/remote_api/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/remote_api/workspace.py) - -### Parameters - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `working_dir` | `str` | Yes | Working directory on server | -| `api_url` | `str` | Yes | Agent server API URL | -| `api_key` | `str` | Yes | Authentication API key | -| `timeout` | `float` | No | Request timeout (default: 30) | - -### Example Configuration - -```python -workspace = RemoteAPIWorkspace( - working_dir="/workspace/user123", - api_url="https://agents.company.com", - api_key="sk-abc123...", - timeout=60.0 # 60 second timeout -) -``` - -## API Communication - -### HTTP Endpoints - -RemoteAPIWorkspace communicates with agent server endpoints: - -- `POST /api/workspace/command` - Execute commands -- `POST /api/workspace/upload` - Upload files -- `GET /api/workspace/download` - Download files -- `GET /api/health` - Health check - -### Authentication - -```python -# API key passed in Authorization header -headers = { - "Authorization": f"Bearer {api_key}" -} -``` - -### Error Handling - -```python -try: - result = workspace.execute_command("command") -except ConnectionError: - print("Failed to connect to agent server") -except TimeoutError: - print("Request timed out") -except Exception as e: - print(f"Execution error: {e}") -``` - -## File Operations - -### Upload Files - -```python -workspace.file_upload( - source_path="local_data.csv", - destination_path="/workspace/data.csv" -) -``` - -### Download Files - -```python -workspace.file_download( - source_path="/workspace/results.json", - destination_path="local_results.json" -) -``` - -### Large File Transfer - -```python -# Chunked upload for large files -workspace.file_upload( - source_path="large_dataset.zip", - destination_path="/workspace/dataset.zip" -) -``` - -## Architecture - -```mermaid -graph LR - Client[Client SDK] -->|HTTPS| API[Agent Server API] - API --> Container1[Container 1] - API --> Container2[Container 2] - API --> Container3[Container 3] - - Container1 --> Agent1[Agent] - Container2 --> Agent2[Agent] - Container3 --> Agent3[Agent] - - style Client fill:#e1f5fe - style API fill:#fff3e0 - style Container1 fill:#e8f5e8 - style Container2 fill:#e8f5e8 - style Container3 fill:#e8f5e8 -``` - -## Use Cases - -### Multi-User Platform - -```python -# Each user gets isolated workspace -user_workspace = RemoteAPIWorkspace( - working_dir=f"/workspace/{user_id}", - api_url="https://agents.platform.com", - api_key=user_api_key -) -``` - -### Scalable Agent Execution - -```python -# Server manages resource allocation -# Multiple agents run concurrently -# Automatic load balancing -``` - -### Centralized Monitoring - -```python -# Server tracks: -# - Resource usage per user -# - Agent execution logs -# - API usage metrics -# - Error rates and debugging info -``` - -## Security - -### Authentication - -- API key-based authentication -- Per-user access control -- Token expiration and rotation - -### Isolation - -- Separate workspaces per user -- Container-based sandboxing -- Network isolation - -### Data Protection - -- HTTPS communication -- Encrypted data transfer -- Secure file storage - -## Performance Considerations - -### Network Latency - -```python -# Latency depends on: -# - Network connection -# - Geographic distance -# - Server load - -# Optimization: -# - Use regional servers -# - Batch operations -# - Cache frequently accessed data -``` - -### Concurrent Execution - -```python -# Server handles concurrent requests -# Multiple users can run agents simultaneously -# Automatic resource management -``` - -## Deployment - -### Running Agent Server - -See [Agent Server Documentation](/sdk/architecture/agent_server/overview.mdx) for server setup: - -```bash -# Start agent server -docker run -d \ - -p 8000:8000 \ - -e API_KEY=your-secret-key \ - ghcr.io/all-hands-ai/agent-server:latest -``` - -### Using Deployed Server - -```python -# Client connects to deployed server -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://your-server.com", - api_key="your-secret-key" -) -``` - -## Comparison with DockerWorkspace - -| Feature | DockerWorkspace | RemoteAPIWorkspace | -|---------|-----------------|-------------------| -| **Setup** | Local Docker | Remote server | -| **Network** | Local | Internet required | -| **Scaling** | Single machine | Multiple users | -| **Management** | Client-side | Server-side | -| **Latency** | Low | Network dependent | -| **Use Case** | Local dev/test | Production | - -## Best Practices - -1. **Use HTTPS**: Always use secure connections -2. **Rotate API Keys**: Regularly update authentication -3. **Set Timeouts**: Configure appropriate timeouts -4. **Handle Network Errors**: Implement retry logic -5. **Monitor Usage**: Track API calls and resource usage -6. **Regional Deployment**: Use nearby servers for lower latency -7. **Batch Operations**: Combine multiple operations when possible - -## Troubleshooting - -### Connection Failures - -```python -# Verify server is reachable -import requests -response = requests.get(f"{api_url}/api/health") -print(response.status_code) # Should be 200 -``` - -### Authentication Errors - -```python -# Verify API key is correct -# Check key has not expired -# Ensure proper authorization headers -``` - -### Timeout Issues - -```python -# Increase timeout for long operations -workspace = RemoteAPIWorkspace( - api_url=api_url, - api_key=api_key, - timeout=120.0 # 2 minutes -) -``` - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Local Docker execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server implementation -- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace examples diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx deleted file mode 100644 index 41977f29..00000000 --- a/sdk/guides/github-workflows/pr-review.mdx +++ /dev/null @@ -1,65 +0,0 @@ ---- -title: PR Review Workflow -description: Automate pull request reviews with AI-powered code analysis using GitHub Actions. ---- - - -This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) - - -Automatically review pull requests when labeled, providing comprehensive feedback on code quality, security, and best practices. - -## Quick Start - -```bash -# 1. Copy workflow to your repository -cp examples/github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml - -# 2. Configure secrets in GitHub Settings → Secrets -# Add: LLM_API_KEY - -# 3. Create a "review-this" label in your repository -# Go to Issues → Labels → New label -``` - -## Features - -- **Automatic Trigger** - Reviews start when `review-this` label is added -- **Comprehensive Analysis** - Analyzes changes in full repository context -- **Detailed Feedback** - Covers code quality, security, best practices -- **GitHub Integration** - Posts comments directly to the PR - -## Usage - -### Trigger a Review - -1. Open a pull request -2. Add the `review-this` label -3. Wait for the workflow to complete -4. Review feedback posted as PR comments - -## Configuration - -Edit `.github/workflows/pr-review.yml` to customize: - -```yaml -env: - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 - # LLM_BASE_URL: 'https://custom-api.example.com' # Optional -``` - -## Review Coverage - -The agent analyzes: - -- **Code Quality** - Readability, maintainability, patterns -- **Security** - Potential vulnerabilities and risks -- **Best Practices** - Language and framework conventions -- **Improvements** - Specific actionable suggestions -- **Positive Feedback** - Recognition of good practices - -## Related Documentation - -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/workflow.yml) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/prompt.py) diff --git a/sdk/guides/github-workflows/routine-maintenance.mdx b/sdk/guides/github-workflows/routine-maintenance.mdx deleted file mode 100644 index 86b42168..00000000 --- a/sdk/guides/github-workflows/routine-maintenance.mdx +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: Routine Maintenance Workflow -description: Automate routine maintenance tasks with GitHub Actions and OpenHands agents. ---- - - -This example is available on GitHub: [examples/github_workflows/01_basic_action/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/01_basic_action) - - -Set up automated or scheduled GitHub Actions workflows to handle routine maintenance tasks like dependency updates, documentation improvements, and code cleanup. - -## Quick Start - -```bash -# 1. Copy workflow to your repository -cp examples/github_workflows/01_basic_action/workflow.yml .github/workflows/maintenance.yml - -# 2. Configure secrets in GitHub Settings → Secrets -# Add: LLM_API_KEY - -# 3. Configure the prompt in workflow.yml -# See below for options -``` - -## Configuration - -### Option A: Direct Prompt - -```yaml -env: - PROMPT_STRING: 'Check for outdated dependencies and create a PR to update them' - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 -``` - -### Option B: Remote Prompt - -```yaml -env: - PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 -``` - -## Usage - -### Manual Trigger - -1. Go to **Actions** → "Maintenance Task" -2. Click **Run workflow** -3. Optionally override prompt settings -4. Click **Run workflow** - -### Scheduled Runs - -Uncomment the schedule section in `workflow.yml`: - -```yaml -on: - schedule: - - cron: "0 2 * * *" # Run at 2 AM UTC daily -``` - -## Example Use Cases - -- **Dependency Updates** - Check and update outdated packages -- **Documentation** - Update docs to reflect code changes -- **Test Coverage** - Identify and improve under-tested code -- **Linting** - Apply formatting and linting fixes -- **Link Validation** - Find and report broken links - -## Related Documentation - -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/workflow.yml) -- [GitHub Actions Cron Syntax](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule) diff --git a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx deleted file mode 100644 index 9f8bef79..00000000 --- a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: API Sandboxed Server -description: Connect to hosted API-based agent server for fully managed infrastructure. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) - - -Connect to a hosted API-based agent server for fully managed infrastructure without running your own server. - -## How to Run - -```bash -export LLM_API_KEY="your-api-key" -export AGENT_SERVER_URL="https://api.openhands.ai" -export AGENT_SERVER_API_KEY="your-server-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation( - agent_server_url="https://api.openhands.ai", - api_key=server_api_key -) -``` - -No server management required - connect to hosted API. - -## Benefits - -- **Zero Ops** - No server management -- **Scalability** - Auto-scaling infrastructure -- **Reliability** - Managed uptime and monitoring - -## Related Documentation - -- [Agent Server Architecture](/sdk/arch/agent_server/overview) -- [Remote Workspace](/sdk/arch/workspace/remote_api) diff --git a/sdk/guides/remote-agent-server/browser-with-docker.mdx b/sdk/guides/remote-agent-server/browser-with-docker.mdx deleted file mode 100644 index a3230976..00000000 --- a/sdk/guides/remote-agent-server/browser-with-docker.mdx +++ /dev/null @@ -1,44 +0,0 @@ ---- -title: Browser with Docker Sandboxed Server -description: Use browser tools with Docker-sandboxed agent server for web automation. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) - - -Combine browser automation capabilities with Docker isolation for secure web interaction. - -## How to Run - -```bash -# Start server with browser support -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest-browser - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation(agent_server_url="http://localhost:8000") -conversation.send_message("Navigate to GitHub and search for OpenHands") -``` - -Browser tools run in isolated Docker container with the agent. - -## Benefits - -- **Secure Browsing** - Isolate web interactions -- **Clean Environment** - Fresh browser state for each session -- **Resource Control** - Limit browser resource usage - -## Related Documentation - -- [Browser Tool](/sdk/arch/tools/browser_use) -- [Docker Workspace](/sdk/arch/workspace/docker) diff --git a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx deleted file mode 100644 index 8c7967e3..00000000 --- a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx +++ /dev/null @@ -1,184 +0,0 @@ ---- -title: Docker Workspace & Sandboxed Server -description: Run agents in isolated Docker containers for security and reproducibility. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) - - -Docker workspaces provide complete isolation by running agents in containers. Use for production deployments, testing, and untrusted code execution. - -## DockerWorkspace - -Execute in isolated Docker containers with security boundaries. - -### Direct Usage - -```python -from openhands.workspace import DockerWorkspace -from openhands.sdk import Conversation - -workspace = DockerWorkspace( - working_dir="/workspace", - base_image="python:3.12" -) - -with workspace: - conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Build a web server") - conversation.run() -# Container automatically cleaned up -``` - -See [`01_docker_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_docker_workspace.py) - -### When to Use - -- **Production** - Isolated execution environment -- **Testing** - Clean, reproducible environments -- **Untrusted code** - Run agent in sandbox -- **Multi-user** - Each user gets isolated container - -### Configuration Options - -```python -DockerWorkspace( - working_dir="/workspace", - base_image="ubuntu:22.04", # Build from base image - # OR - server_image="ghcr.io/all-hands-ai/agent-server:latest", # Pre-built image - host_port=None, # Auto-assign port - platform="linux/amd64" # Platform override -) -``` - -### Pre-built Images - -Use pre-built images for faster startup: - -```python -workspace = DockerWorkspace( - working_dir="/workspace", - server_image="ghcr.io/all-hands-ai/agent-server:latest" -) -``` - -No build time - container starts immediately. - -### File Transfer - -Copy files to/from container: - -```python -# Upload file -workspace.upload_file("/local/path/file.txt", "/workspace/file.txt") - -# Download file -workspace.download_file("/workspace/output.txt", "/local/path/output.txt") -``` - -## Docker Sandboxed Server - -Run agent server in Docker and connect remotely. - -### How to Run - -```bash -# Start server in Docker -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py -``` - -### Client Connection - -```python -from openhands.sdk import RemoteConversation - -conversation = RemoteConversation( - agent_server_url="http://localhost:8000", - api_key=api_key -) -conversation.send_message("Your task") -conversation.run() -``` - -## Benefits - -**Security:** -- Complete isolation from host system -- Agent cannot access host files -- Agent cannot affect host processes - -**Resources:** -- Control CPU/memory limits -- Monitor container resource usage -- Kill containers if needed - -**Reproducibility:** -- Consistent environment across deployments -- Version-controlled container images -- Easy rollback to previous versions - -## Docker vs Local Workspace - -| Feature | LocalWorkspace | DockerWorkspace | -|---------|----------------|-----------------| -| **Security** | Low (host access) | High (isolated) | -| **Setup** | None | Docker required | -| **Performance** | Fast | Slight overhead | -| **Cleanup** | Manual | Automatic | -| **Best for** | Development | Production | - -## Best Practices - -### 1. Use Pre-built Images - -```python -# āœ… Good: Fast startup -server_image="ghcr.io/all-hands-ai/agent-server:latest" - -# āŒ Slow: Builds on every run -base_image="python:3.12" -``` - -### 2. Clean Up Containers - -Use context manager for automatic cleanup: - -```python -with workspace: - # Work with workspace - pass -# Container automatically removed -``` - -### 3. Resource Limits - -Set Docker resource limits: - -```bash -docker run --memory="2g" --cpus="1.5" \ - ghcr.io/all-hands-ai/runtime:latest -``` - -### 4. Volume Mounts - -Mount local directories for persistent data: - -```bash -docker run -v /local/data:/workspace/data \ - ghcr.io/all-hands-ai/runtime:latest -``` - -## Related Documentation - -- **[Browser with Docker](/sdk/guides/remote-agent-server/browser-with-docker)** - Browser in container -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details diff --git a/sdk/guides/remote-agent-server/local-agent-server.mdx b/sdk/guides/remote-agent-server/local-agent-server.mdx deleted file mode 100644 index c08c9c8d..00000000 --- a/sdk/guides/remote-agent-server/local-agent-server.mdx +++ /dev/null @@ -1,91 +0,0 @@ ---- -title: Local Agent Server & Workspaces -description: Understand workspaces and run agent server locally for client-server architecture. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) - - -Workspaces define where agents execute commands and access files. This guide introduces workspace concepts and demonstrates the local agent server setup. - -## Workspace Types - -| Type | Security | Setup | Use Case | -|------|----------|-------|----------| -| **LocalWorkspace** | Low (host access) | None | Development | -| **DockerWorkspace** | High (isolated) | Docker | Testing, Production | -| **RemoteAPIWorkspace** | High (isolated) | Server | Multi-user, Cloud | - -## LocalWorkspace - -Execute directly on your machine - default for standalone SDK. - -### Usage - -```python -from openhands.sdk import Conversation - -# LocalWorkspace is implicit (no workspace parameter needed) -conversation = Conversation(agent=agent) -conversation.send_message("Create a Python script") -conversation.run() -``` - -Operations run in current working directory with direct host access. - -### When to Use - -- **Development** - Quick iteration and testing -- **Local files** - Direct access to local filesystem -- **Simple tasks** - No isolation needed - -### Security Considerations - -āš ļø **Warning**: Agent has full host access: -- Can modify any accessible files -- Can execute any commands -- **Not recommended for production or untrusted code** - -## Remote Agent Server - -Run agent server and connect remotely for resource isolation and scalability. - -### How to Run - -```bash -# Terminal 1: Start server -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python -m openhands.agent_server - -# Terminal 2: Run client -export LLM_API_KEY="your-api-key" -uv run python examples/02_remote_agent_server/01_convo_with_local_agent_server.py -``` - -### Client Connection - -```python -from openhands.sdk import RemoteConversation - -conversation = RemoteConversation( - agent_server_url="http://localhost:8000", - api_key=api_key -) -conversation.send_message("Your task") -conversation.run() -``` - -### Benefits - -- **Resource Isolation** - Server handles compute-intensive tasks -- **Scalability** - Multiple clients connect to same server -- **Deployment** - Separate client and execution environments -- **Security** - Isolate agent execution from client - -## Related Documentation - -- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/docker-sandboxed-server)** - Isolated execution -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design diff --git a/sdk/guides/remote-agent-server/vscode-with-docker.mdx b/sdk/guides/remote-agent-server/vscode-with-docker.mdx deleted file mode 100644 index 78aa7598..00000000 --- a/sdk/guides/remote-agent-server/vscode-with-docker.mdx +++ /dev/null @@ -1,43 +0,0 @@ ---- -title: VS Code with Docker Sandboxed Server -description: Enable VS Code integration for code editing with Docker-sandboxed agent. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) - - -Use VS Code tools with Docker-sandboxed agent server for code editing and development workflows. - -## How to Run - -```bash -# Start server with VS Code support -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest-vscode - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation(agent_server_url="http://localhost:8000") -conversation.send_message("Create a Python Flask app with routes") -``` - -Agent uses VS Code tools for editing, navigation, and refactoring in isolated environment. - -## Benefits - -- **Rich Code Editing** - VS Code features in agent workflows -- **Isolated Development** - Safe code changes in container -- **Full IDE Features** - Syntax highlighting, auto-complete, etc. - -## Related Documentation - -- [Docker Workspace](/sdk/arch/workspace/docker) From f37b3e1475fd54f654aaeb044ed631cc397d1c5e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 10:16:22 -0400 Subject: [PATCH 21/58] fix typo --- sdk/guides/mcp.mdx | 4 ++-- sdk/guides/security.mdx | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/guides/mcp.mdx b/sdk/guides/mcp.mdx index 334f0e70..c874dda1 100644 --- a/sdk/guides/mcp.mdx +++ b/sdk/guides/mcp.mdx @@ -103,7 +103,7 @@ for i, message in enumerate(llm_messages): print(f"Message {i}: {str(message)[:200]}") ``` -```bash Running the example +```bash Running the Example export LLM_API_KEY="your-api-key" cd agent-sdk uv run python examples/01_standalone_sdk/07_mcp_integration.py @@ -216,7 +216,7 @@ for i, message in enumerate(llm_messages): print(f"Message {i}: {str(message)[:200]}") ``` -```bash Running the example +```bash Running the Example export LLM_API_KEY="your-api-key" cd agent-sdk uv run python examples/01_standalone_sdk/08_mcp_with_oauth.py diff --git a/sdk/guides/security.mdx b/sdk/guides/security.mdx index cddea4a0..a0b1601d 100644 --- a/sdk/guides/security.mdx +++ b/sdk/guides/security.mdx @@ -7,7 +7,7 @@ Agent actions can be controlled through two complementary mechanisms: **confirma ## Confirmation Policy -Confirmation policy control whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) From 45f3c0bac2c2d009798f2e22ed8862a734fa5fb9 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 22 Oct 2025 14:16:46 +0000 Subject: [PATCH 22/58] docs: sync code blocks from agent-sdk examples Synced from agent-sdk ref: main --- sdk/guides/convo-pause-and-resume.mdx | 4 ++-- sdk/guides/llm-image-input.mdx | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sdk/guides/convo-pause-and-resume.mdx b/sdk/guides/convo-pause-and-resume.mdx index b9f56d89..b4768998 100644 --- a/sdk/guides/convo-pause-and-resume.mdx +++ b/sdk/guides/convo-pause-and-resume.mdx @@ -74,8 +74,8 @@ thread = threading.Thread(target=conversation.run) thread.start() # Let the agent work for a few seconds -print("Letting agent work for 5 seconds...") -time.sleep(5) +print("Letting agent work for 2 seconds...") +time.sleep(2) # Phase 2: Pause the agent print() diff --git a/sdk/guides/llm-image-input.mdx b/sdk/guides/llm-image-input.mdx index 1092b24e..597c9747 100644 --- a/sdk/guides/llm-image-input.mdx +++ b/sdk/guides/llm-image-input.mdx @@ -51,6 +51,7 @@ llm = LLM( base_url=base_url, api_key=SecretStr(api_key), ) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." cwd = os.getcwd() @@ -86,7 +87,6 @@ IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/log conversation.send_message( Message( role="user", - vision_enabled=True, content=[ TextContent( text=( From 645093893a6f777e0fda7c58973e63755c9dbc1c Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 10:39:01 -0400 Subject: [PATCH 23/58] make llm routing expandable --- sdk/guides/llm-routing.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/llm-routing.mdx b/sdk/guides/llm-routing.mdx index 17a78ed2..b76c392f 100644 --- a/sdk/guides/llm-routing.mdx +++ b/sdk/guides/llm-routing.mdx @@ -11,7 +11,7 @@ This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing. Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: -```python icon="python" examples/01_standalone_sdk/19_llm_routing.py +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py import os from pydantic import SecretStr From d8890e18566cd39d5d924d9bb99682da78009e39 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 10:47:26 -0400 Subject: [PATCH 24/58] Revert "remove stuff not ready yet" This reverts commit a651e927f7fc9634f1e99d62ced16fd5ed92be80. --- sdk/arch/agent_server/overview.mdx | 433 ++++++++++++++++ sdk/arch/overview.mdx | 142 +++++ sdk/arch/sdk/agent.mdx | 301 +++++++++++ sdk/arch/sdk/condenser.mdx | 166 ++++++ sdk/arch/sdk/conversation.mdx | 487 ++++++++++++++++++ sdk/arch/sdk/event.mdx | 403 +++++++++++++++ sdk/arch/sdk/llm.mdx | 416 +++++++++++++++ sdk/arch/sdk/mcp.mdx | 333 ++++++++++++ sdk/arch/sdk/microagents.mdx | 225 ++++++++ sdk/arch/sdk/security.mdx | 416 +++++++++++++++ sdk/arch/sdk/tool.mdx | 199 +++++++ sdk/arch/sdk/workspace.mdx | 322 ++++++++++++ sdk/arch/tools/bash.mdx | 288 +++++++++++ sdk/arch/tools/browser_use.mdx | 101 ++++ sdk/arch/tools/file_editor.mdx | 338 ++++++++++++ sdk/arch/tools/glob.mdx | 89 ++++ sdk/arch/tools/grep.mdx | 140 +++++ sdk/arch/tools/overview.mdx | 185 +++++++ sdk/arch/tools/planning_file_editor.mdx | 128 +++++ sdk/arch/tools/task_tracker.mdx | 146 ++++++ sdk/arch/workspace/docker.mdx | 330 ++++++++++++ sdk/arch/workspace/overview.mdx | 99 ++++ sdk/arch/workspace/remote_api.mdx | 325 ++++++++++++ sdk/guides/github-workflows/pr-review.mdx | 65 +++ .../github-workflows/routine-maintenance.mdx | 74 +++ .../api-sandboxed-server.mdx | 42 ++ .../browser-with-docker.mdx | 44 ++ .../docker-sandboxed-server.mdx | 184 +++++++ .../local-agent-server.mdx | 91 ++++ .../vscode-with-docker.mdx | 43 ++ 30 files changed, 6555 insertions(+) create mode 100644 sdk/arch/agent_server/overview.mdx create mode 100644 sdk/arch/overview.mdx create mode 100644 sdk/arch/sdk/agent.mdx create mode 100644 sdk/arch/sdk/condenser.mdx create mode 100644 sdk/arch/sdk/conversation.mdx create mode 100644 sdk/arch/sdk/event.mdx create mode 100644 sdk/arch/sdk/llm.mdx create mode 100644 sdk/arch/sdk/mcp.mdx create mode 100644 sdk/arch/sdk/microagents.mdx create mode 100644 sdk/arch/sdk/security.mdx create mode 100644 sdk/arch/sdk/tool.mdx create mode 100644 sdk/arch/sdk/workspace.mdx create mode 100644 sdk/arch/tools/bash.mdx create mode 100644 sdk/arch/tools/browser_use.mdx create mode 100644 sdk/arch/tools/file_editor.mdx create mode 100644 sdk/arch/tools/glob.mdx create mode 100644 sdk/arch/tools/grep.mdx create mode 100644 sdk/arch/tools/overview.mdx create mode 100644 sdk/arch/tools/planning_file_editor.mdx create mode 100644 sdk/arch/tools/task_tracker.mdx create mode 100644 sdk/arch/workspace/docker.mdx create mode 100644 sdk/arch/workspace/overview.mdx create mode 100644 sdk/arch/workspace/remote_api.mdx create mode 100644 sdk/guides/github-workflows/pr-review.mdx create mode 100644 sdk/guides/github-workflows/routine-maintenance.mdx create mode 100644 sdk/guides/remote-agent-server/api-sandboxed-server.mdx create mode 100644 sdk/guides/remote-agent-server/browser-with-docker.mdx create mode 100644 sdk/guides/remote-agent-server/docker-sandboxed-server.mdx create mode 100644 sdk/guides/remote-agent-server/local-agent-server.mdx create mode 100644 sdk/guides/remote-agent-server/vscode-with-docker.mdx diff --git a/sdk/arch/agent_server/overview.mdx b/sdk/arch/agent_server/overview.mdx new file mode 100644 index 00000000..593cb646 --- /dev/null +++ b/sdk/arch/agent_server/overview.mdx @@ -0,0 +1,433 @@ +--- +title: Agent Server +description: HTTP server for remote agent execution with Docker-based sandboxing and API access. +--- + +The Agent Server provides HTTP API endpoints for remote agent execution. It enables centralized agent management, multi-user support, and production deployments. + +**Source**: [`openhands/agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server) + +## Purpose + +The Agent Server enables: +- **Remote Execution**: Run agents on dedicated servers +- **Multi-User Support**: Isolate execution per user +- **Resource Management**: Centralized resource allocation +- **API Access**: HTTP API for agent operations +- **Production Deployment**: Scalable agent infrastructure + +## Architecture + +```mermaid +graph TD + Client[Client SDK] -->|HTTPS| Server[Agent Server] + Server --> Router[FastAPI Router] + + Router --> Workspace[Workspace API] + Router --> Health[Health Check] + + Workspace --> Docker[Docker Manager] + Docker --> Container1[Container 1] + Docker --> Container2[Container 2] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Router fill:#e8f5e8 + style Docker fill:#f3e5f5 +``` + +## Quick Start + +### Using Pre-built Docker Image + +```bash +# Pull latest image +docker pull ghcr.io/all-hands-ai/agent-server:latest + +# Run server +docker run -d \ + -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +### Using Python + +```bash +# Install agent-server package +pip install openhands-agent-server + +# Start server +openhands-agent-server +``` + +## Building Docker Images + +**Source**: [`openhands/agent_server/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server/docker) + +### Build Script + +```bash +# Build from source +python -m openhands.agent_server.docker.build \ + --base-image ubuntu:22.04 \ + --target runtime \ + --platform linux/amd64 +``` + +### Build Options + +| Option | Description | Default | +|--------|-------------|---------| +| `--base-image` | Base Docker image | `ubuntu:22.04` | +| `--target` | Build target (`runtime` or `dev`) | `runtime` | +| `--platform` | Target platform | Host platform | +| `--output-image` | Output image name | Auto-generated | + +### Programmatic Build + +```python +from openhands.agent_server.docker.build import ( + BuildOptions, + build +) + +# Build custom image +image_name = build( + BuildOptions( + base_image="python:3.12", + target="runtime", + platform="linux/amd64" + ) +) + +print(f"Built image: {image_name}") +``` + +## Docker Images + +### Official Images + +```bash +# Latest release +ghcr.io/all-hands-ai/agent-server:latest + +# Specific version +ghcr.io/all-hands-ai/agent-server:v1.0.0 + +# Development build +ghcr.io/all-hands-ai/agent-server:dev +``` + +### Image Variants + +- **`runtime`**: Production-ready, minimal size +- **`dev`**: Development tools included + +## API Endpoints + +### Health Check + +```bash +GET /api/health +``` + +Returns server health status. + +### Execute Command + +```bash +POST /api/workspace/command +Content-Type: application/json +Authorization: Bearer + +{ + "command": "python script.py", + "working_dir": "/workspace", + "timeout": 30.0 +} +``` + +### File Upload + +```bash +POST /api/workspace/upload +Authorization: Bearer +Content-Type: multipart/form-data + +# Form data with file +``` + +### File Download + +```bash +GET /api/workspace/download?path=/workspace/output.txt +Authorization: Bearer +``` + +## Configuration + +### Environment Variables + +```bash +# Server configuration +export HOST=0.0.0.0 +export PORT=8000 +export API_KEY=your-secret-key + +# Docker configuration +export DOCKER_HOST=unix:///var/run/docker.sock + +# Logging +export LOG_LEVEL=INFO +export DEBUG=false +``` + +### Server Settings + +```python +# config.py +class Settings: + host: str = "0.0.0.0" + port: int = 8000 + api_key: str = "your-secret-key" + workers: int = 4 + timeout: float = 300.0 +``` + +## Deployment + +### Docker Compose + +```yaml +# docker-compose.yml +version: '3.8' + +services: + agent-server: + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - "8000:8000" + volumes: + - /var/run/docker.sock:/var/run/docker.sock + environment: + - API_KEY=your-secret-key + - LOG_LEVEL=INFO + restart: unless-stopped +``` + +### Kubernetes + +```yaml +# deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + selector: + matchLabels: + app: agent-server + template: + metadata: + labels: + app: agent-server + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 + env: + - name: API_KEY + valueFrom: + secretKeyRef: + name: agent-server-secrets + key: api-key +``` + +### Systemd Service + +```ini +# /etc/systemd/system/agent-server.service +[Unit] +Description=OpenHands Agent Server +After=docker.service +Requires=docker.service + +[Service] +Type=simple +ExecStart=/usr/bin/docker run \ + --rm \ + -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest + +Restart=always +RestartSec=10 + +[Install] +WantedBy=multi-user.target +``` + +## Security + +### Authentication + +```python +# API key authentication +from fastapi import Header, HTTPException + +async def verify_api_key(authorization: str = Header(None)): + if not authorization or not authorization.startswith("Bearer "): + raise HTTPException(status_code=401) + + api_key = authorization.split(" ")[1] + if api_key != expected_api_key: + raise HTTPException(status_code=403) +``` + +### Container Isolation + +- Each request executes in separate Docker container +- Containers have resource limits +- Network isolation between containers +- Automatic cleanup after execution + +### Rate Limiting + +```python +# Implement rate limiting per API key +from slowapi import Limiter + +limiter = Limiter(key_func=lambda: request.headers.get("Authorization")) + +@app.post("/api/workspace/command") +@limiter.limit("100/minute") +async def execute_command(...): + ... +``` + +## Monitoring + +### Health Checks + +```bash +# Check if server is running +curl http://localhost:8000/api/health + +# Response: +# {"status": "healthy", "version": "1.0.0"} +``` + +### Logging + +```python +# Structured logging +import logging + +logger = logging.getLogger("agent_server") +logger.info("Request received", extra={ + "user_id": user_id, + "command": command, + "duration": duration +}) +``` + +### Metrics + +Track important metrics: +- Request rate and latency +- Container creation/cleanup time +- Resource usage per container +- Error rates and types + +## Troubleshooting + +### Server Won't Start + +```bash +# Check port availability +netstat -tuln | grep 8000 + +# Check Docker socket +docker ps + +# Check logs +docker logs agent-server +``` + +### Container Creation Fails + +```bash +# Verify Docker permissions +docker run hello-world + +# Check Docker socket mount +ls -la /var/run/docker.sock + +# Check available resources +docker stats +``` + +### Performance Issues + +```bash +# Check resource usage +docker stats + +# Increase worker count +export WORKERS=8 + +# Optimize container startup +# Use pre-built images +# Reduce image size +``` + +## Best Practices + +1. **Use Pre-built Images**: Faster startup, consistent environment +2. **Set Resource Limits**: Prevent resource exhaustion +3. **Enable Monitoring**: Track performance and errors +4. **Implement Rate Limiting**: Prevent abuse +5. **Secure API Keys**: Use strong, rotated keys +6. **Use HTTPS**: Encrypt data in transit +7. **Regular Updates**: Keep images updated +8. **Backup Configuration**: Version control configurations + +## Development + +### Running Locally + +```bash +# Clone repository +git clone https://github.com/All-Hands-AI/agent-sdk.git +cd agent-sdk + +# Install dependencies +pip install -e ".[server]" + +# Run development server +uvicorn openhands.agent_server.main:app --reload +``` + +### Testing + +```bash +# Run tests +pytest openhands/agent_server/tests/ + +# Test specific endpoint +curl -X POST http://localhost:8000/api/workspace/command \ + -H "Authorization: Bearer test-key" \ + -H "Content-Type: application/json" \ + -d '{"command": "echo test", "working_dir": "/workspace"}' +``` + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based local execution +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Client for agent server +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Server usage examples +- **[FastAPI Documentation](https://fastapi.tiangolo.com/)** - Web framework used diff --git a/sdk/arch/overview.mdx b/sdk/arch/overview.mdx new file mode 100644 index 00000000..6662ba9b --- /dev/null +++ b/sdk/arch/overview.mdx @@ -0,0 +1,142 @@ +--- +title: Overview +description: A modular framework for building AI agents, organized into four packages for clarity and extensibility. +--- + +The OpenHands Agent SDK is organized into four packages, each serving a distinct purpose in the agent development lifecycle. + +## Package Structure + +```mermaid +graph TD + SDK[SDK Package
Core Framework] --> Tools[Tools Package
Built-in Tools] + SDK --> Workspace[Workspace Package
Execution Environments] + SDK --> AgentServer[Agent Server Package
Remote Execution] + + Tools -.->|Used by| SDK + Workspace -.->|Used by| SDK + AgentServer -.->|Hosts| SDK + + style SDK fill:#e1f5fe + style Tools fill:#e8f5e8 + style Workspace fill:#fff3e0 + style AgentServer fill:#f3e5f5 +``` + +## 1. SDK Package + +Core framework for building agents locally. + +**Key Components:** +- **[Tool System](/sdk/architecture/sdk/tool)** - Define custom capabilities +- **[Microagents](/sdk/architecture/sdk/microagents)** - Specialized behavior modules +- **[Condenser](/sdk/architecture/sdk/condenser)** - Memory management +- **[Agent](/sdk/architecture/sdk/agent)** - Base agent interface +- **[Workspace](/sdk/architecture/sdk/workspace)** - Execution abstraction +- **[Conversation](/sdk/architecture/sdk/conversation)** - Lifecycle management +- **[Event](/sdk/architecture/sdk/event)** - Event system +- **[LLM](/sdk/architecture/sdk/llm)** - Language model integration +- **[MCP](/sdk/architecture/sdk/mcp)** - Model Context Protocol +- **[Security](/sdk/architecture/sdk/security)** - Security framework + +## 2. Tools Package + +Production-ready tool implementations. + +**Available Tools:** +- **[BashTool](/sdk/architecture/tools/bash)** - Command execution +- **[FileEditorTool](/sdk/architecture/tools/file_editor)** - File manipulation +- **[GlobTool](/sdk/architecture/tools/glob)** - File discovery +- **[GrepTool](/sdk/architecture/tools/grep)** - Content search +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker)** - Task management +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor)** - Multi-file workflows +- **[BrowserUseTool](/sdk/architecture/tools/browser_use)** - Web interaction + +## 3. Workspace Package + +Advanced execution environments for production. + +**Workspace Types:** +- **[DockerWorkspace](/sdk/architecture/workspace/docker)** - Container-based isolation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api)** - Remote server execution + +See [Workspace Overview](/sdk/architecture/workspace/overview) for comparison. + +## 4. Agent Server Package + +HTTP server for centralized agent execution. + +**Capabilities:** +- Remote agent execution via API +- Multi-user isolation +- Container management +- Resource allocation + +See [Agent Server Documentation](/sdk/architecture/agent_server/overview). + +## Component Interaction + +```mermaid +graph LR + User[User] -->|Message| Conv[Conversation] + Conv -->|Manages| Agent[Agent] + + Agent -->|Reasons with| LLM[LLM] + Agent -->|Executes| Tools[Tools] + Agent -->|Guided by| Micro[Microagents] + + Tools -->|Run in| Workspace[Workspace] + + style User fill:#e1f5fe + style Conv fill:#fff3e0 + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fce4ec + style Workspace fill:#e0f2f1 +``` + +## Design Principles + +### Immutability & Serialization +All core classes are: +- **Immutable**: State changes create new instances +- **Serializable**: Full conversation state can be saved/restored +- **Type-safe**: Pydantic models ensure data integrity + +### Modularity +- **Composable**: Mix and match components as needed +- **Extensible**: Add custom tools, LLMs, or workspaces +- **Testable**: Each component can be tested in isolation + +### Backward Compatibility +- **Semantic versioning** indicates compatibility levels +- **Migration guides** provided for major changes + +## Getting Started + +New to the SDK? Start with the guides: + +- **[Getting Started](/sdk/guides/getting-started)** - Quick introduction +- **[Streaming Mode](/sdk/guides/streaming-mode)** - Execution patterns +- **[Tools & MCP](/sdk/guides/tools-and-mcp)** - Extending capabilities +- **[Workspaces](/sdk/guides/workspaces)** - Execution environments +- **[Sub-agents](/sdk/guides/subagents)** - Agent delegation + +## Deep Dive + +Explore individual components: + +- **SDK Package** - [Tool](/sdk/architecture/sdk/tool) | [Agent](/sdk/architecture/sdk/agent) | [LLM](/sdk/architecture/sdk/llm) | [Conversation](/sdk/architecture/sdk/conversation) +- **Tools Package** - [BashTool](/sdk/architecture/tools/bash) | [FileEditorTool](/sdk/architecture/tools/file_editor) +- **Workspace Package** - [DockerWorkspace](/sdk/architecture/workspace/docker) | [RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api) +- **Agent Server** - [Overview](/sdk/architecture/agent_server/overview) + +## Examples + +Browse the [`examples/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples) directory for practical implementations: + +- **Hello World** - Basic agent usage +- **Custom Tools** - Creating new capabilities +- **Docker Workspace** - Sandboxed execution +- **MCP Integration** - External tool servers +- **Planning Agent** - Multi-step workflows diff --git a/sdk/arch/sdk/agent.mdx b/sdk/arch/sdk/agent.mdx new file mode 100644 index 00000000..3c0da066 --- /dev/null +++ b/sdk/arch/sdk/agent.mdx @@ -0,0 +1,301 @@ +--- +title: Agent +description: Core orchestrator combining language models with tools to execute tasks through structured reasoning loops. +--- + +The Agent orchestrates LLM reasoning with tool execution to solve tasks. It manages the reasoning loop, system prompts, and state transitions while maintaining conversation context. + +**Source**: [`openhands/sdk/agent/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent) + +## Core Concepts + +```mermaid +graph TD + Agent[Agent] --> LLM[LLM] + Agent --> Tools[Tools] + Agent --> Context[AgentContext] + Agent --> Condenser[Condenser] + + Context --> Microagents[Microagents] + Tools --> Bash[BashTool] + Tools --> FileEditor[FileEditorTool] + Tools --> MCP[MCP Tools] + + style Agent fill:#e1f5fe + style LLM fill:#fff3e0 + style Tools fill:#e8f5e8 + style Context fill:#f3e5f5 +``` + +An agent combines: +- **LLM**: Language model for reasoning and decision-making +- **Tools**: Capabilities to interact with the environment +- **Context**: Additional knowledge and specialized expertise +- **Condenser**: Memory management for long conversations + +## Base Interface + +**Source**: [`openhands/sdk/agent/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/base.py) + +### AgentBase + +Abstract base class defining the agent interface: + +```python +from openhands.sdk.agent import AgentBase +from openhands.sdk.conversation import ConversationState + +class CustomAgent(AgentBase): + def step(self, state: ConversationState) -> ConversationState: + """Execute one reasoning step and return updated state.""" + # Your agent logic here + return updated_state +``` + +**Key Properties**: +- **Immutable**: Agents are frozen Pydantic models +- **Serializable**: Full agent configuration can be saved/restored +- **Type-safe**: Strict type checking with Pydantic validation + +## Agent Implementation + +**Source**: [`openhands/sdk/agent/agent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/agent.py) + +### Initialization Arguments + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[ + BashTool.create(), + FileEditorTool.create() + ], + mcp_config={}, # Optional MCP configuration + filter_tools_regex=None, # Optional regex to filter tools + agent_context=None, # Optional context with microagents + condenser=None, # Optional context condenser + security_analyzer=None, # Optional security analyzer + confirmation_policy=None, # Optional confirmation policy +) +``` + +### Key Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `llm` | `LLM` | Language model configuration (required) | +| `tools` | `list[Tool]` | Tools available to the agent | +| `mcp_config` | `dict` | MCP server configuration for external tools | +| `filter_tools_regex` | `str` | Regex to filter available tools | +| `agent_context` | `AgentContext` | Additional context and microagents | +| `condenser` | `CondenserBase` | Context condensation strategy | +| `security_analyzer` | `SecurityAnalyzer` | Security risk analysis | +| `confirmation_policy` | `ConfirmationPolicy` | Action confirmation strategy | + +## Agent Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant LLM + participant Tools + + User->>Conversation: Start conversation + Conversation->>Agent: Initialize state + loop Until task complete + Conversation->>Agent: step(state) + Agent->>LLM: Generate response + LLM->>Agent: Tool calls + reasoning + Agent->>Tools: Execute actions + Tools->>Agent: Observations + Agent->>Conversation: Updated state + end + Conversation->>User: Final result +``` + +### Execution Flow + +1. **Initialization**: Create agent with LLM and tools +2. **State Setup**: Pass agent to conversation +3. **Reasoning Loop**: Conversation calls `agent.step(state)` repeatedly +4. **Tool Execution**: Agent executes tool calls from LLM +5. **State Updates**: Agent returns updated conversation state +6. **Termination**: Loop ends when agent calls `FinishTool` + +## Usage Examples + +### Basic Agent + +See [`examples/01_standalone_sdk/01_hello_world.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py): + +```python +from openhands.sdk import Agent, LLM, Conversation +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create LLM +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +# Create agent +agent = Agent( + llm=llm, + tools=[ + BashTool.create(), + FileEditorTool.create() + ] +) + +# Use with conversation +conversation = Conversation(agent=agent) +await conversation.run(user_message="Your task here") +``` + +### Agent with Context + +See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py): + +```python +from openhands.sdk import Agent, AgentContext + +# Create context with microagents +context = AgentContext( + microagents=["testing_expert", "code_reviewer"] +) + +agent = Agent( + llm=llm, + tools=tools, + agent_context=context +) +``` + +### Agent with Memory Management + +See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): + +```python +from openhands.sdk.context import LLMCondenser + +condenser = LLMCondenser( + max_tokens=8000, + target_tokens=6000 +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +### Agent with MCP Tools + +See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py): + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} + +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config +) +``` + +### Planning Agent Workflow + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) for a complete example of multi-phase agent workflows. + +## System Prompts + +**Source**: [`openhands/sdk/agent/prompts/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent/prompts) + +Agents use Jinja2 templates for system prompts. Available templates: + +| Template | Use Case | Source | +|----------|----------|--------| +| `system_prompt.j2` | Default reasoning and tool usage | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt.j2) | +| `system_prompt_interactive.j2` | Interactive conversations | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_interactive.j2) | +| `system_prompt_long_horizon.j2` | Complex multi-step tasks | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_long_horizon.j2) | +| `system_prompt_planning.j2` | Planning-focused workflows | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_planning.j2) | + +### Custom Prompts + +Create custom agent classes with specialized prompts: + +```python +class PlanningAgent(Agent): + system_prompt_filename: str = "system_prompt_planning.j2" +``` + +## Custom Agent Development + +### Extending AgentBase + +```python +from openhands.sdk.agent import AgentBase +from openhands.sdk.conversation import ConversationState + +class SpecializedAgent(AgentBase): + # Custom configuration + max_iterations: int = 10 + + def step(self, state: ConversationState) -> ConversationState: + # Custom reasoning logic + # Tool selection and execution + # State management + return updated_state +``` + +### Multi-Agent Composition + +```python +class WorkflowAgent(AgentBase): + planning_agent: Agent + execution_agent: Agent + + def step(self, state: ConversationState) -> ConversationState: + # Phase 1: Planning + plan = self.planning_agent.step(state) + + # Phase 2: Execution + result = self.execution_agent.step(plan) + + return result +``` + +## Best Practices + +1. **Tool Selection**: Provide only necessary tools to reduce complexity +2. **Clear Instructions**: Use detailed user messages for better task understanding +3. **Context Management**: Use condensers for long-running conversations +4. **Error Handling**: Implement proper error recovery strategies +5. **Security**: Use confirmation policies for sensitive operations +6. **Testing**: Test agents with various scenarios and edge cases + +## See Also + +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Defining and using tools +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing agent conversations +- **[LLM](/sdk/architecture/sdk/llm.mdx)** - Language model configuration +- **[MCP](/sdk/architecture/sdk/mcp.mdx)** - External tool integration +- **[Security](/sdk/architecture/sdk/security.mdx)** - Security and confirmation policies diff --git a/sdk/arch/sdk/condenser.mdx b/sdk/arch/sdk/condenser.mdx new file mode 100644 index 00000000..59d59da6 --- /dev/null +++ b/sdk/arch/sdk/condenser.mdx @@ -0,0 +1,166 @@ +--- +title: Context Condenser +description: Manage agent memory by intelligently compressing conversation history when approaching token limits. +--- + +The context condenser manages agent memory by intelligently compressing conversation history when approaching token limits. This enables agents to maintain coherent context in long-running conversations without exceeding LLM context windows. + +**Source**: [`openhands/sdk/context/condenser/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/condenser) + +## Why Context Condensation? + +```mermaid +graph LR + A[Long Conversation] --> B{Token Limit?} + B -->|Approaching| C[Condense] + B -->|Within Limit| D[Continue] + C --> E[Compressed Context] + E --> F[Agent with Memory] + D --> F + + style A fill:#e1f5fe + style C fill:#fff3e0 + style E fill:#e8f5e8 + style F fill:#f3e5f5 +``` + +As conversations grow, they may exceed LLM context windows. Condensers solve this by: +- Summarizing older messages while preserving key information +- Maintaining recent context in full detail +- Reducing token count without losing conversation coherence + +## LLM Condenser (Default) + +**Source**: [`openhands/sdk/context/condenser/llm_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/condenser/llm_condenser.py) + +The default condenser uses an LLM to intelligently summarize conversation history. + +### How It Works + +1. **Monitor Token Count**: Tracks conversation token usage +2. **Trigger Condensation**: Activates when approaching token threshold +3. **Summarize History**: Uses LLM to compress older messages +4. **Preserve Recent**: Keeps recent messages uncompressed +5. **Update Context**: Replaces verbose history with summary + +### Configuration + +```python +from openhands.sdk.context import LLMCondenser + +condenser = LLMCondenser( + max_tokens=8000, # Trigger condensation at this limit + target_tokens=6000, # Reduce to this token count + preserve_recent=10 # Keep last N messages uncompressed +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +### Example Usage + +See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): + +```python +from openhands.sdk import Agent, LLM +from openhands.sdk.context import LLMCondenser +from pydantic import SecretStr + +# Configure condenser +condenser = LLMCondenser( + max_tokens=8000, + target_tokens=6000 +) + +# Create agent with condenser +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +agent = Agent( + llm=llm, + tools=tools, + condenser=condenser +) +``` + +## Condensation Strategy + +### Multi-Phase Approach + +```mermaid +sequenceDiagram + participant Agent + participant Condenser + participant LLM + + Agent->>Condenser: Check token count + Condenser->>Condenser: Exceeds threshold? + Condenser->>LLM: Summarize old messages + LLM->>Condenser: Summary + Condenser->>Agent: Updated context + Agent->>Agent: Continue with condensed history +``` + +### What Gets Condensed + +- **System messages**: Preserved as-is +- **Recent messages**: Kept in full (configurable count) +- **Older messages**: Summarized into compact form +- **Tool results**: Preserved for reference +- **User preferences**: Maintained across condensation + +## Custom Condensers + +Implement custom condensation strategies by extending the base class: + +```python +from openhands.sdk.context import CondenserBase +from openhands.sdk.event import ConversationState + +class CustomCondenser(CondenserBase): + def condense(self, state: ConversationState) -> ConversationState: + """Implement custom condensation logic.""" + # Your condensation algorithm + return condensed_state + + def should_condense(self, state: ConversationState) -> bool: + """Determine when to trigger condensation.""" + # Your trigger logic + return token_count > threshold +``` + +## Best Practices + +1. **Set Appropriate Thresholds**: Leave buffer room below actual limit +2. **Preserve Recent Context**: Keep enough messages for coherent flow +3. **Monitor Performance**: Track condensation frequency and effectiveness +4. **Test Condensation**: Verify important information isn't lost +5. **Adjust Per Use Case**: Different tasks need different settings + +## Configuration Guidelines + +| Use Case | max_tokens | target_tokens | preserve_recent | +|----------|-----------|---------------|-----------------| +| Short tasks | 4000 | 3000 | 5 | +| Medium conversations | 8000 | 6000 | 10 | +| Long-running agents | 16000 | 12000 | 20 | +| Code-heavy tasks | 12000 | 10000 | 15 | + +## Performance Considerations + +- **Condensation Cost**: Uses additional LLM calls +- **Latency**: Brief pause during condensation +- **Context Quality**: Trade-off between compression and information retention +- **Frequency**: Tune thresholds to minimize condensation events + +## See Also + +- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using condensers with agents +- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py)** - Working example +- **[Conversation State](/sdk/architecture/sdk/conversation.mdx)** - Managing conversation state diff --git a/sdk/arch/sdk/conversation.mdx b/sdk/arch/sdk/conversation.mdx new file mode 100644 index 00000000..e702fb36 --- /dev/null +++ b/sdk/arch/sdk/conversation.mdx @@ -0,0 +1,487 @@ +--- +title: Conversation +description: Manage agent lifecycles through structured message flows and state persistence. +--- + +The Conversation class orchestrates agent execution through structured message flows. It manages the agent lifecycle, state persistence, and provides APIs for interaction and monitoring. + +**Source**: [`openhands/sdk/conversation/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/conversation) + +## Core Concepts + +```mermaid +graph LR + User[User] --> Conversation[Conversation] + Conversation --> Agent[Agent] + Conversation --> State[ConversationState] + Conversation --> Events[Event History] + + Agent --> Step[step()] + State --> Persistence[Persistence] + + style Conversation fill:#e1f5fe + style Agent fill:#f3e5f5 + style State fill:#fff3e0 + style Events fill:#e8f5e8 +``` + +A conversation: +- **Manages Agent Lifecycle**: Initializes and runs agents until completion +- **Handles State**: Maintains conversation history and context +- **Enables Interaction**: Send messages and receive responses +- **Provides Persistence**: Save and restore conversation state +- **Monitors Progress**: Track execution stats and events + +## Basic API + +**Source**: [`openhands/sdk/conversation/conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation.py) + +### Creating a Conversation + +```python +from openhands.sdk import Conversation, Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create agent +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Create conversation +conversation = Conversation( + agent=agent, + workspace="workspace/project", # Working directory + persistence_dir="conversations", # Save conversation state + max_iteration_per_run=500, # Max steps per run + stuck_detection=True, # Detect infinite loops + visualize=True # Generate execution visualizations +) +``` + +### Constructor Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `agent` | `AgentBase` | *Required* | Agent to run in the conversation | +| `workspace` | `str \| LocalWorkspace \| RemoteWorkspace` | `"workspace/project"` | Execution environment | +| `persistence_dir` | `str \| None` | `None` | Directory for saving state | +| `conversation_id` | `ConversationID \| None` | `None` | Resume existing conversation | +| `callbacks` | `list[ConversationCallbackType] \| None` | `None` | Event callbacks | +| `max_iteration_per_run` | `int` | `500` | Maximum steps per `run()` call | +| `stuck_detection` | `bool` | `True` | Enable stuck detection | +| `visualize` | `bool` | `True` | Generate visualizations | +| `secrets` | `dict \| None` | `None` | Secret values for agent | + +## Agent Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant State + + User->>Conversation: Create conversation(agent) + Conversation->>State: Initialize state + Conversation->>Agent: init_state() + + User->>Conversation: send_message("Task") + Conversation->>State: Add message event + + User->>Conversation: run() + loop Until agent finishes or max iterations + Conversation->>Agent: step(state) + Agent->>State: Update with actions/observations + Conversation->>User: Callback with events + end + + User->>Conversation: agent_final_response() + Conversation->>User: Return final result +``` + +### 1. Create Agent + +Define agent with LLM and tools: + +```python +agent = Agent(llm=llm, tools=tools) +``` + +### 2. Create Conversation + +Pass agent to conversation: + +```python +conversation = Conversation(agent=agent) +``` + +### 3. Send Messages + +Add user messages to conversation: + +```python +conversation.send_message("Build a web scraper for news articles") +``` + +### 4. Run Agent + +Execute agent until task completion: + +```python +conversation.run() +``` + +The conversation will call `agent.step(state)` repeatedly until: +- Agent calls `FinishTool` +- Maximum iterations reached +- Agent encounters an error +- User pauses execution + +### 5. Get Results + +Retrieve agent's final response: + +```python +result = conversation.agent_final_response() +print(result) +``` + +## Core Methods + +**Source**: [`openhands/sdk/conversation/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/base.py) + +### send_message() + +Add a message to the conversation: + +```python +# String message +conversation.send_message("Write unit tests for the API") + +# Message object with images +from openhands.sdk.llm import Message, ImageContent + +message = Message( + role="user", + content=[ + "What's in this image?", + ImageContent(source="path/to/image.png") + ] +) +conversation.send_message(message) +``` + +See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). + +### run() + +Execute the agent until completion or max iterations: + +```python +# Synchronous execution +conversation.run() + +# Async execution +await conversation.run() +``` + +See [`examples/01_standalone_sdk/11_async.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) for async usage. + +### agent_final_response() + +Get the agent's final response: + +```python +final_response = conversation.agent_final_response() +``` + +### pause() + +Pause agent execution: + +```python +conversation.pause() +``` + +See [`examples/01_standalone_sdk/09_pause_example.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py). + +### close() + +Clean up resources: + +```python +conversation.close() +``` + +## Conversation State + +**Source**: [`openhands/sdk/conversation/state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/state.py) + +### Accessing State + +```python +state = conversation.state + +# Conversation properties +print(state.id) # Unique conversation ID +print(state.agent_status) # Current execution status +print(state.events) # Event history + +# Agent and workspace +print(state.agent) # The agent instance +print(state.workspace) # The workspace +``` + +### Agent Execution Status + +```python +from openhands.sdk.conversation.state import AgentExecutionStatus + +status = state.agent_status + +# Possible values: +# - AgentExecutionStatus.IDLE +# - AgentExecutionStatus.RUNNING +# - AgentExecutionStatus.FINISHED +# - AgentExecutionStatus.ERROR +# - AgentExecutionStatus.PAUSED +``` + +## Persistence + +### Saving Conversations + +Conversations are automatically persisted when `persistence_dir` is set: + +```python +conversation = Conversation( + agent=agent, + persistence_dir="conversations" # Saves to conversations// +) +``` + +See [`examples/01_standalone_sdk/10_persistence.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py). + +### Resuming Conversations + +Resume from a saved conversation ID: + +```python +from openhands.sdk.conversation.types import ConversationID + +# Get conversation ID +conv_id = conversation.id + +# Later, resume with the same ID +resumed_conversation = Conversation( + agent=agent, + conversation_id=conv_id, + persistence_dir="conversations" +) +``` + +## Monitoring and Stats + +**Source**: [`openhands/sdk/conversation/conversation_stats.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation_stats.py) + +### Conversation Stats + +```python +stats = conversation.conversation_stats + +print(stats.total_messages) # Total messages exchanged +print(stats.total_tokens) # Total tokens used +print(stats.total_cost) # Estimated cost +print(stats.duration) # Execution time +``` + +See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). + +## Event Callbacks + +### Registering Callbacks + +Monitor conversation events in real-time: + +```python +from openhands.sdk.conversation import ConversationCallbackType +from openhands.sdk.event import Event + +def on_event(event: Event): + if isinstance(event, MessageEvent): + print(f"Message: {event.content}") + elif isinstance(event, ActionEvent): + print(f"Action: {event.action.kind}") + elif isinstance(event, ObservationEvent): + print(f"Observation: {event.observation.kind}") + +conversation = Conversation( + agent=agent, + callbacks=[on_event] +) +``` + +## Advanced Features + +### Stuck Detection + +**Source**: [`openhands/sdk/conversation/stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/stuck_detector.py) + +Automatically detects when agents are stuck in loops: + +```python +conversation = Conversation( + agent=agent, + stuck_detection=True # Default: True +) +``` + +See [`examples/01_standalone_sdk/20_stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py). + +### Secrets Management + +**Source**: [`openhands/sdk/conversation/secrets_manager.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/secrets_manager.py) + +Provide secrets for agent operations: + +```python +conversation = Conversation( + agent=agent, + secrets={ + "API_KEY": "secret-value", + "DATABASE_URL": "postgres://..." + } +) + +# Update secrets during execution +conversation.update_secrets({ + "NEW_TOKEN": "new-value" +}) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +### Visualization + +**Source**: [`openhands/sdk/conversation/visualizer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/visualizer.py) + +Generate execution visualizations: + +```python +conversation = Conversation( + agent=agent, + visualize=True # Default: True +) + +# Visualizations saved to workspace/visualizations/ +``` + +### Title Generation + +Generate conversation titles: + +```python +title = conversation.generate_title(max_length=50) +print(f"Conversation: {title}") +``` + +## Local vs Remote Conversations + +### LocalConversation + +**Source**: [`openhands/sdk/conversation/impl/local_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/local_conversation.py) + +Runs agent locally: + +```python +from openhands.sdk.workspace import LocalWorkspace + +conversation = Conversation( + agent=agent, + workspace=LocalWorkspace(working_dir="/project") +) +``` + +### RemoteConversation + +**Source**: [`openhands/sdk/conversation/impl/remote_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/remote_conversation.py) + +Runs agent on remote server: + +```python +from openhands.workspace import RemoteAPIWorkspace + +conversation = Conversation( + agent=agent, + workspace=RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com" + ) +) +``` + +## Best Practices + +1. **Set Appropriate Iteration Limits**: Prevent runaway executions +2. **Use Persistence**: Save important conversations for resume/replay +3. **Monitor Events**: Use callbacks for real-time monitoring +4. **Handle Errors**: Check agent status and handle failures gracefully +5. **Clean Up Resources**: Call `close()` when done +6. **Enable Stuck Detection**: Catch infinite loops early +7. **Track Stats**: Monitor token usage and costs + +## Complete Example + +```python +from openhands.sdk import Conversation, Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +# Create agent +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Create conversation +conversation = Conversation( + agent=agent, + workspace="workspace/project", + persistence_dir="conversations", + max_iteration_per_run=100 +) + +try: + # Send task + conversation.send_message("Create a simple REST API") + + # Run agent + conversation.run() + + # Get result + result = conversation.agent_final_response() + print(f"Result: {result}") + + # Check stats + stats = conversation.conversation_stats + print(f"Tokens used: {stats.total_tokens}") + print(f"Cost: ${stats.total_cost}") +finally: + # Clean up + conversation.close() +``` + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration and usage +- **[Events](/sdk/architecture/sdk/event.mdx)** - Event types and handling +- **[Workspace](/sdk/architecture/sdk/workspace.mdx)** - Workspace configuration +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - Usage examples diff --git a/sdk/arch/sdk/event.mdx b/sdk/arch/sdk/event.mdx new file mode 100644 index 00000000..a286dab0 --- /dev/null +++ b/sdk/arch/sdk/event.mdx @@ -0,0 +1,403 @@ +--- +title: Event System +description: Structured event types representing agent actions, observations, and system messages in conversations. +--- + +The event system provides structured representations of all interactions in agent conversations. Events enable state management, LLM communication, and real-time monitoring. + +**Source**: [`openhands/sdk/event/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/event) + +## Core Concepts + +```mermaid +graph TD + Event[Event] --> LLMConvertible[LLMConvertibleEvent] + Event --> NonConvertible[Non-LLM Events] + + LLMConvertible --> Action[ActionEvent] + LLMConvertible --> Observation[ObservationEvent] + LLMConvertible --> Message[MessageEvent] + LLMConvertible --> System[SystemPromptEvent] + + NonConvertible --> State[StateUpdateEvent] + NonConvertible --> User[UserActionEvent] + NonConvertible --> Condenser[CondenserEvent] + + style Event fill:#e1f5fe + style LLMConvertible fill:#fff3e0 + style NonConvertible fill:#e8f5e8 +``` + +Events fall into two categories: +- **LLMConvertibleEvent**: Events that become LLM messages +- **Non-LLM Events**: Internal state and control events + +## Base Event Classes + +**Source**: [`openhands/sdk/event/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/base.py) + +### Event + +Base class for all events: + +```python +from openhands.sdk.event import Event + +class Event: + id: str # Unique event identifier + timestamp: str # ISO format timestamp + source: SourceType # Event source (agent/user/system) +``` + +**Properties**: +- **Immutable**: Events are frozen Pydantic models +- **Serializable**: Full event data can be saved/restored +- **Visualizable**: Rich text representation for display + +### LLMConvertibleEvent + +Events that can be converted to LLM messages: + +```python +from openhands.sdk.event import LLMConvertibleEvent +from openhands.sdk.llm import Message + +class LLMConvertibleEvent(Event): + def to_llm_message(self) -> Message: + """Convert event to LLM message format.""" + ... +``` + +These events form the conversation history sent to the LLM. + +## LLM-Convertible Events + +### ActionEvent + +**Source**: [`openhands/sdk/event/llm_convertible/action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/action.py) + +Represents actions taken by the agent: + +```python +from openhands.sdk.event import ActionEvent +from openhands.sdk.tool import Action + +class ActionEvent(LLMConvertibleEvent): + action: Action # The action being executed + thought: str # Agent's reasoning (optional) +``` + +**Purpose**: Records what the agent decided to do. + +**Example**: +```python +from openhands.tools import BashAction + +action_event = ActionEvent( + source="agent", + action=BashAction(command="ls -la"), + thought="List files to understand directory structure" +) +``` + +### ObservationEvent + +**Source**: [`openhands/sdk/event/llm_convertible/observation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/observation.py) + +Represents observations from tool execution: + +```python +from openhands.sdk.event import ObservationEvent +from openhands.sdk.tool import Observation + +class ObservationEvent(LLMConvertibleEvent): + observation: Observation # Tool execution result +``` + +**Purpose**: Records the outcome of agent actions. + +**Example**: +```python +from openhands.tools import BashObservation + +observation_event = ObservationEvent( + source="tool", + observation=BashObservation( + output="file1.txt\nfile2.py\n", + exit_code=0 + ) +) +``` + +**Related Events**: +- **AgentErrorEvent**: Agent execution errors +- **UserRejectObservation**: User rejected an action + +### MessageEvent + +**Source**: [`openhands/sdk/event/llm_convertible/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/message.py) + +Represents messages in the conversation: + +```python +from openhands.sdk.event import MessageEvent + +class MessageEvent(LLMConvertibleEvent): + content: str | list # Message content (text or multimodal) + role: str # Role: "user", "assistant", "system" + images_urls: list[str] # Optional image URLs +``` + +**Purpose**: User messages, agent responses, and system messages. + +**Example**: +```python +message_event = MessageEvent( + source="user", + content="Create a web scraper", + role="user" +) +``` + +### SystemPromptEvent + +**Source**: [`openhands/sdk/event/llm_convertible/system.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/system.py) + +Represents system prompts: + +```python +from openhands.sdk.event import SystemPromptEvent + +class SystemPromptEvent(LLMConvertibleEvent): + content: str # System prompt content +``` + +**Purpose**: Provides instructions and context to the agent. + +## Non-LLM Events + +### ConversationStateUpdateEvent + +**Source**: [`openhands/sdk/event/conversation_state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/conversation_state.py) + +Tracks conversation state changes: + +```python +from openhands.sdk.event import ConversationStateUpdateEvent + +class ConversationStateUpdateEvent(Event): + # Internal state update event + # Not sent to LLM +``` + +**Purpose**: Internal tracking of conversation state transitions. + +### PauseEvent + +**Source**: [`openhands/sdk/event/user_action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/user_action.py) + +User paused the conversation: + +```python +from openhands.sdk.event import PauseEvent + +class PauseEvent(Event): + pass +``` + +**Purpose**: Signal that user has paused agent execution. + +### Condenser Events + +**Source**: [`openhands/sdk/event/condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/condenser.py) + +Track context condensation: + +#### Condensation + +```python +class Condensation(Event): + content: str # Condensed summary +``` + +**Purpose**: Record the condensed conversation history. + +#### CondensationRequest + +```python +class CondensationRequest(Event): + pass +``` + +**Purpose**: Request context condensation. + +#### CondensationSummaryEvent + +```python +class CondensationSummaryEvent(LLMConvertibleEvent): + content: str # Summary for LLM +``` + +**Purpose**: Provide condensed context to LLM. + +## Event Source Types + +**Source**: [`openhands/sdk/event/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/types.py) + +```python +SourceType = Literal["agent", "user", "tool", "system"] +``` + +- **agent**: Events from the agent +- **user**: Events from the user +- **tool**: Events from tool execution +- **system**: System-generated events + +## Event Streams + +### Converting to LLM Messages + +Events are converted to LLM messages for context: + +```python +from openhands.sdk.event import LLMConvertibleEvent + +events = [action_event, observation_event, message_event] +messages = LLMConvertibleEvent.events_to_messages(events) + +# Send to LLM +response = llm.completion(messages=messages) +``` + +### Event Batching + +Multiple actions in a single step are batched: + +```python +# Multi-action events +action1 = ActionEvent(action=BashAction(...)) +action2 = ActionEvent(action=FileEditAction(...)) + +# Converted to single LLM message with multiple tool calls +messages = LLMConvertibleEvent.events_to_messages([action1, action2]) +``` + +## Event Visualization + +Events support rich text visualization: + +```python +from openhands.sdk.event import Event + +event = MessageEvent( + source="user", + content="Hello", + role="user" +) + +# Rich text representation +print(event.visualize) + +# Plain text +print(str(event)) +# Output: MessageEvent (user) +# user: Hello +``` + +## Event Callbacks + +Monitor events in real-time: + +```python +from openhands.sdk import Conversation +from openhands.sdk.event import ( + Event, + ActionEvent, + ObservationEvent, + MessageEvent +) + +def on_event(event: Event): + if isinstance(event, MessageEvent): + print(f"šŸ’¬ Message: {event.content}") + elif isinstance(event, ActionEvent): + print(f"šŸ”§ Action: {event.action.kind}") + elif isinstance(event, ObservationEvent): + print(f"šŸ‘ļø Observation: {event.observation.content}") + +conversation = Conversation( + agent=agent, + callbacks=[on_event] +) +``` + +## Event History + +Access conversation event history: + +```python +conversation = Conversation(agent=agent) +conversation.send_message("Task") +conversation.run() + +# Get all events +events = conversation.state.events + +# Filter by type +actions = [e for e in events if isinstance(e, ActionEvent)] +observations = [e for e in events if isinstance(e, ObservationEvent)] +messages = [e for e in events if isinstance(e, MessageEvent)] +``` + +## Serialization + +Events are fully serializable: + +```python +# Serialize event +event_json = event.model_dump_json() + +# Deserialize +from openhands.sdk.event import Event +restored_event = Event.model_validate_json(event_json) +``` + +## Best Practices + +1. **Use Type Guards**: Check event types with `isinstance()` +2. **Handle All Types**: Cover all event types in callbacks +3. **Preserve Immutability**: Never mutate event objects +4. **Log Events**: Keep event history for debugging +5. **Filter Strategically**: Process only relevant events +6. **Visualize for Debugging**: Use `event.visualize` for rich output + +## Event Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Conversation + participant Agent + participant Events + + User->>Conversation: send_message() + Conversation->>Events: MessageEvent + + Conversation->>Agent: step() + Agent->>Events: ActionEvent(s) + + Agent->>Tool: Execute + Tool->>Events: ObservationEvent(s) + + Events->>LLM: Convert to messages + LLM->>Agent: Generate response + + Agent->>Events: New ActionEvent(s) +``` + +## See Also + +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing conversations and event streams +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent execution and event generation +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool actions and observations +- **[Condenser](/sdk/architecture/sdk/condenser.mdx)** - Context condensation events diff --git a/sdk/arch/sdk/llm.mdx b/sdk/arch/sdk/llm.mdx new file mode 100644 index 00000000..3a418d92 --- /dev/null +++ b/sdk/arch/sdk/llm.mdx @@ -0,0 +1,416 @@ +--- +title: LLM Integration +description: Language model integration supporting multiple providers through LiteLLM with built-in retry logic and metrics tracking. +--- + +The LLM class provides a unified interface for language model integration, supporting multiple providers through [LiteLLM](https://docs.litellm.ai/). It handles authentication, retries, metrics tracking, and streaming responses. + +**Source**: [`openhands/sdk/llm/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/llm) + +## Core Concepts + +```mermaid +graph LR + LLM[LLM] --> Completion[completion()] + LLM --> Metrics[Metrics Tracking] + LLM --> Retry[Retry Logic] + + Completion --> Provider[Provider API] + Provider --> OpenAI[OpenAI] + Provider --> Anthropic[Anthropic] + Provider --> Others[Other Providers] + + style LLM fill:#e1f5fe + style Completion fill:#fff3e0 + style Metrics fill:#e8f5e8 + style Retry fill:#f3e5f5 +``` + +## Basic Usage + +**Source**: [`openhands/sdk/llm/llm.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/llm.py) + +### Creating an LLM + +```python +from openhands.sdk import LLM +from pydantic import SecretStr + +# Basic configuration +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") +) + +# With custom settings +llm = LLM( + model="openai/gpt-4", + api_key=SecretStr("your-api-key"), + base_url="https://api.openai.com/v1", + temperature=0.7, + max_tokens=4096, + timeout=60.0 +) +``` + +### Configuration Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `model` | `str` | `"claude-sonnet-4-20250514"` | Model identifier | +| `api_key` | `SecretStr \| None` | `None` | API key for authentication | +| `base_url` | `str \| None` | `None` | Custom API endpoint | +| `temperature` | `float` | `0.0` | Sampling temperature (0-2) | +| `max_tokens` | `int \| None` | `None` | Maximum tokens to generate | +| `timeout` | `float` | `60.0` | Request timeout in seconds | +| `num_retries` | `int` | `8` | Number of retry attempts | +| `retry_min_wait` | `int` | `3` | Minimum retry wait (seconds) | +| `retry_max_wait` | `int` | `60` | Maximum retry wait (seconds) | +| `retry_multiplier` | `float` | `2.0` | Retry backoff multiplier | + +## Generating Completions + +### Basic Completion + +```python +from openhands.sdk.llm import Message + +messages = [ + Message(role="user", content="What is the capital of France?") +] + +response = llm.completion(messages=messages) +print(response.choices[0].message.content) +# Output: "The capital of France is Paris." +``` + +### With Tool Calling + +```python +from openhands.sdk import Agent +from openhands.tools import BashTool + +# Tools are automatically converted to function schemas +agent = Agent( + llm=llm, + tools=[BashTool.create()] +) + +# LLM receives tool schemas and can call them +``` + +### Streaming Responses + +```python +# Enable streaming +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + stream=True +) + +# Stream response chunks +for chunk in llm.completion(messages=messages): + if chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") +``` + +## Model Providers + +The SDK supports all providers available in LiteLLM: + +### Anthropic + +```python +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("sk-ant-...") +) +``` + +### OpenAI + +```python +llm = LLM( + model="openai/gpt-4", + api_key=SecretStr("sk-...") +) +``` + +### Azure OpenAI + +```python +llm = LLM( + model="azure/gpt-4", + api_key=SecretStr("your-azure-key"), + api_base="https://your-resource.openai.azure.com", + api_version="2024-02-01" +) +``` + +### Custom Providers + +```python +llm = LLM( + model="custom-provider/model-name", + base_url="https://custom-api.example.com/v1", + api_key=SecretStr("your-api-key") +) +``` + +See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for full list. + +## LLM Registry + +**Source**: Use pre-configured LLM instances from registry. + +See [`examples/01_standalone_sdk/05_use_llm_registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py): + +```python +from openhands.sdk.llm.registry import get_llm + +# Get pre-configured LLM +llm = get_llm( + model_name="claude-sonnet-4", + # Configuration from environment or defaults +) +``` + +## Metrics and Monitoring + +### Tracking Metrics + +**Source**: [`openhands/sdk/llm/utils/metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/utils/metrics.py) + +```python +# Get metrics snapshot +metrics = llm.metrics.snapshot() + +print(f"Total tokens: {metrics.accumulated_cost}") +print(f"Total cost: ${metrics.accumulated_cost}") +print(f"Requests: {metrics.total_requests}") +``` + +See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). + +### Cost Tracking + +```python +from openhands.sdk.conversation import Conversation + +conversation = Conversation(agent=Agent(llm=llm, tools=tools)) +conversation.send_message("Task") +conversation.run() + +# Get conversation stats +stats = conversation.conversation_stats +print(f"Total tokens: {stats.total_tokens}") +print(f"Estimated cost: ${stats.total_cost}") +``` + +## Advanced Features + +### LLM Routing + +**Source**: Route between different LLMs based on criteria. + +See [`examples/01_standalone_sdk/19_llm_routing.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py): + +```python +# Use different LLMs for different tasks +fast_llm = LLM(model="openai/gpt-4o-mini", api_key=SecretStr("...")) +powerful_llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key=SecretStr("...")) + +# Route based on task complexity +if task_is_simple: + agent = Agent(llm=fast_llm, tools=tools) +else: + agent = Agent(llm=powerful_llm, tools=tools) +``` + +### Model Reasoning + +**Source**: Access model reasoning from Anthropic thinking blocks and OpenAI responses API. + +See [`examples/01_standalone_sdk/22_model_reasoning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py): + +```python +# Enable Anthropic extended thinking +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + thinking={"type": "enabled", "budget_tokens": 1000} +) + +# Or use OpenAI responses API for reasoning +llm = LLM( + model="openai/gpt-5-codex", + api_key=SecretStr("your-api-key"), + reasoning_effort="high" +) +``` + +## Error Handling + +### Automatic Retries + +The LLM class automatically retries on transient failures: + +```python +from litellm.exceptions import RateLimitError, APIConnectionError + +# These exceptions trigger automatic retry: +# - APIConnectionError +# - RateLimitError +# - ServiceUnavailableError +# - Timeout +# - InternalServerError + +# Configure retry behavior +llm = LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key"), + num_retries=8, # Number of retries + retry_min_wait=3, # Min wait between retries (seconds) + retry_max_wait=60, # Max wait between retries (seconds) + retry_multiplier=2.0 # Exponential backoff multiplier +) +``` + +### Exception Handling + +```python +from litellm.exceptions import ( + RateLimitError, + ContextWindowExceededError, + BadRequestError +) + +try: + response = llm.completion(messages=messages) +except RateLimitError: + print("Rate limit exceeded, automatic retry in progress") +except ContextWindowExceededError: + print("Context window exceeded, reduce message history") +except BadRequestError as e: + print(f"Bad request: {e}") +``` + +## Message Types + +**Source**: [`openhands/sdk/llm/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py) + +### Text Messages + +```python +from openhands.sdk.llm import Message + +message = Message( + role="user", + content="Hello, how are you?" +) +``` + +### Multimodal Messages + +```python +from openhands.sdk.llm import Message, ImageContent + +message = Message( + role="user", + content=[ + "What's in this image?", + ImageContent(source="path/to/image.png") + ] +) +``` + +See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). + +### Tool Call Messages + +```python +from openhands.sdk.llm import Message, MessageToolCall + +# Message with tool calls +message = Message( + role="assistant", + content="Let me run that command", + tool_calls=[ + MessageToolCall( + id="call_123", + function={"name": "execute_bash", "arguments": '{"command": "ls"}'} + ) + ] +) +``` + +## Model Features + +### Vision Support + +```python +from litellm.utils import supports_vision + +if supports_vision(llm.model): + # Model supports image inputs + message = Message( + role="user", + content=["Describe this image", ImageContent(source="image.png")] + ) +``` + +### Token Counting + +```python +from litellm.utils import token_counter + +# Count tokens in messages +messages = [Message(role="user", content="Hello world")] +tokens = token_counter(model=llm.model, messages=messages) +print(f"Message uses {tokens} tokens") +``` + +### Model Information + +```python +from litellm.utils import get_model_info + +info = get_model_info(llm.model) +print(f"Max tokens: {info['max_tokens']}") +print(f"Cost per token: {info['input_cost_per_token']}") +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: Adjust based on expected response time +2. **Configure Retries**: Balance reliability with latency requirements +3. **Monitor Metrics**: Track token usage and costs +4. **Handle Exceptions**: Implement proper error handling +5. **Use Streaming**: For better user experience with long responses +6. **Secure API Keys**: Use `SecretStr` and environment variables +7. **Choose Right Model**: Balance cost, speed, and capability + +## Environment Variables + +Configure LLM via environment variables: + +```bash +# API keys +export ANTHROPIC_API_KEY="sk-ant-..." +export OPENAI_API_KEY="sk-..." +export AZURE_API_KEY="..." + +# Custom endpoints +export OPENAI_API_BASE="https://custom-endpoint.com" + +# Model defaults +export LLM_MODEL="anthropic/claude-sonnet-4-20250514" +``` + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Using LLMs with agents +- **[Message Types](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py)** - Message structure +- **[LiteLLM Documentation](https://docs.litellm.ai/)** - Provider details +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - LLM usage examples diff --git a/sdk/arch/sdk/mcp.mdx b/sdk/arch/sdk/mcp.mdx new file mode 100644 index 00000000..ea18a670 --- /dev/null +++ b/sdk/arch/sdk/mcp.mdx @@ -0,0 +1,333 @@ +--- +title: MCP Integration +description: Connect agents to external tools and services through the Model Context Protocol. +--- + +MCP (Model Context Protocol) integration enables agents to connect to external tools and services through a standardized protocol. The SDK seamlessly converts MCP tools into native agent tools. + +**Source**: [`openhands/sdk/mcp/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/mcp) + +## What is MCP? + +[Model Context Protocol](https://modelcontextprotocol.io/) is an open protocol that standardizes how AI applications connect to external data sources and tools. It enables: + +- **Standardized Integration**: Connect to any MCP-compliant service +- **Dynamic Discovery**: Tools are discovered at runtime +- **Multiple Transports**: Support for stdio, HTTP, and SSE +- **OAuth Support**: Secure authentication for external services + +## Basic Usage + +### Creating MCP Tools + +```python +from openhands.sdk import create_mcp_tools + +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} + +# Create MCP tools +mcp_tools = create_mcp_tools(mcp_config, timeout=30) + +# Use with agent +from openhands.sdk import Agent +from openhands.tools import BashTool + +agent = Agent( + llm=llm, + tools=[ + BashTool.create(), + *mcp_tools # Add MCP tools + ] +) +``` + +See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py). + +### Using MCP Config in Agent + +```python +# Simpler: provide MCP config directly to agent +agent = Agent( + llm=llm, + tools=[BashTool.create()], + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } + } +) +``` + +## Configuration Formats + +The SDK uses the [FastMCP configuration format](https://gofastmcp.com/clients/client#configuration-format). + +### Stdio Servers + +Run local MCP servers via stdio: + +```python +mcp_config = { + "mcpServers": { + "filesystem": { + "transport": "stdio", # Optional, default + "command": "python", + "args": ["./mcp-server-filesystem.py"], + "env": {"DEBUG": "true"}, + "cwd": "/path/to/server" + } + } +} +``` + +### HTTP/SSE Servers + +Connect to remote MCP servers: + +```python +mcp_config = { + "mcpServers": { + "remote_api": { + "transport": "http", # or "sse" + "url": "https://api.example.com/mcp", + "headers": { + "Authorization": "Bearer token" + } + } + } +} +``` + +### OAuth Authentication + +Authenticate with OAuth-enabled services: + +```python +mcp_config = { + "mcpServers": { + "google_drive": { + "transport": "http", + "url": "https://mcp.google.com/drive", + "auth": "oauth", # Enable OAuth flow + } + } +} +``` + +See [`examples/01_standalone_sdk/08_mcp_with_oauth.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py). + +## Available MCP Servers + +Popular MCP servers you can integrate: + +### Official Servers + +- **fetch**: HTTP requests ([mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)) +- **filesystem**: File operations ([mcp-server-filesystem](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem)) +- **git**: Git operations ([mcp-server-git](https://github.com/modelcontextprotocol/servers/tree/main/src/git)) +- **github**: GitHub API ([mcp-server-github](https://github.com/modelcontextprotocol/servers/tree/main/src/github)) +- **postgres**: PostgreSQL queries ([mcp-server-postgres](https://github.com/modelcontextprotocol/servers/tree/main/src/postgres)) + +### Community Servers + +See [MCP Servers Directory](https://github.com/modelcontextprotocol/servers) for more. + +## MCP Tool Conversion + +MCP tools are automatically converted to SDK tools: + +```mermaid +graph LR + MCPServer[MCP Server] --> Discovery[Tool Discovery] + Discovery --> Schema[Tool Schema] + Schema --> SDKTool[SDK Tool] + SDKTool --> Agent[Agent] + + style MCPServer fill:#e1f5fe + style SDKTool fill:#fff3e0 + style Agent fill:#e8f5e8 +``` + +1. **Discovery**: MCP server lists available tools +2. **Schema Extraction**: Tool schemas extracted from MCP +3. **Tool Creation**: SDK tools created with proper typing +4. **Agent Integration**: Tools available to agent + +## Configuration Options + +### Timeout + +Set connection timeout for MCP servers: + +```python +mcp_tools = create_mcp_tools(mcp_config, timeout=60) # 60 seconds +``` + +### Multiple Servers + +Configure multiple MCP servers: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "uvx", + "args": ["mcp-server-filesystem"] + }, + "github": { + "command": "uvx", + "args": ["mcp-server-github"] + } + } +} +``` + +All tools from all servers are available to the agent. + +## Error Handling + +```python +try: + mcp_tools = create_mcp_tools(mcp_config, timeout=30) +except TimeoutError: + print("MCP server connection timed out") +except Exception as e: + print(f"Failed to create MCP tools: {e}") + mcp_tools = [] # Continue without MCP tools + +agent = Agent(llm=llm, tools=[*base_tools, *mcp_tools]) +``` + +## Tool Filtering + +Filter MCP tools using regex: + +```python +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^fetch_.*" # Only tools starting with "fetch_" +) +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: MCP servers may take time to initialize +2. **Handle Failures Gracefully**: Continue with reduced functionality if MCP fails +3. **Use Official Servers**: Start with well-tested MCP servers +4. **Secure Credentials**: Use environment variables for sensitive data +5. **Test Locally First**: Verify MCP servers work before deploying +6. **Monitor Performance**: MCP adds latency, monitor impact +7. **Version Pin**: Specify exact versions of MCP servers + +## Environment Variables + +Configure MCP servers via environment: + +```bash +# GitHub MCP server +export GITHUB_PERSONAL_ACCESS_TOKEN="ghp_..." + +# Google Drive OAuth +export GOOGLE_CLIENT_ID="..." +export GOOGLE_CLIENT_SECRET="..." + +# Custom MCP endpoints +export MCP_FETCH_URL="https://custom-mcp.example.com" +``` + +## Advanced Usage + +### Custom MCP Client + +For advanced control, use the MCP client directly: + +```python +from openhands.sdk.mcp.client import MCPClient + +# Create custom MCP client +client = MCPClient( + server_config={ + "command": "python", + "args": ["./custom-server.py"] + }, + timeout=60 +) + +# Get tools from client +tools = client.list_tools() + +# Use tools with agent +agent = Agent(llm=llm, tools=tools) +``` + +## Debugging + +### Enable Debug Logging + +```python +import logging + +logging.getLogger("openhands.sdk.mcp").setLevel(logging.DEBUG) +``` + +### Verify MCP Server + +Test MCP server independently: + +```bash +# Run MCP server directly +uvx mcp-server-fetch + +# Check if server responds +curl http://localhost:3000/mcp/tools +``` + +## Common Issues + +### Server Not Found + +```python +# Ensure server is installed +# For uvx-based servers: +uvx --help # Check if uvx is available +uvx mcp-server-fetch --help # Check if server is available +``` + +### Connection Timeout + +```python +# Increase timeout +mcp_tools = create_mcp_tools(mcp_config, timeout=120) +``` + +### OAuth Flow Issues + +```python +# Ensure OAuth credentials are configured +# Check browser opens for OAuth consent +# Verify redirect URL matches configuration +``` + +## See Also + +- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Official MCP documentation +- **[MCP Servers](https://github.com/modelcontextprotocol/servers)** - Official server implementations +- **[FastMCP](https://gofastmcp.com/)** - Configuration format documentation +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - SDK tool system +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py)** - MCP integration examples diff --git a/sdk/arch/sdk/microagents.mdx b/sdk/arch/sdk/microagents.mdx new file mode 100644 index 00000000..00c95dd8 --- /dev/null +++ b/sdk/arch/sdk/microagents.mdx @@ -0,0 +1,225 @@ +--- +title: Microagents +description: Specialized context providers that inject targeted knowledge into agent conversations. +--- + +Microagents are specialized context providers that inject targeted knowledge into agent conversations when specific triggers are detected. They enable modular, reusable expertise without modifying the main agent. + +## What are Microagents? + +Microagents provide focused knowledge or instructions that are dynamically added to the agent's context when relevant keywords are detected in the conversation. This allows agents to access specialized expertise on-demand. + +For a comprehensive guide on using microagents, see the [official microagents documentation](https://docs.all-hands.dev/usage/prompting/microagents-overview). + +**Source**: [`openhands/sdk/context/microagents/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/microagents) + +## Microagent Types + +**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) + +The SDK provides three types of microagents, each serving a distinct purpose: + +### 1. KnowledgeMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L162`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L162) + +Provides specialized expertise triggered by keywords in conversations. + +**Activation Logic:** +- Contains a list of trigger keywords +- Activated when any trigger appears in conversation +- Case-insensitive matching + +**Use Cases:** +- Language best practices (Python, JavaScript, etc.) +- Framework guidelines (React, Django, etc.) +- Common patterns and anti-patterns +- Tool usage instructions + +**Example:** +```python +from openhands.sdk.context.microagents import KnowledgeMicroagent + +microagent = KnowledgeMicroagent( + name="python_testing", + content="Always use pytest for Python tests...", + triggers=["pytest", "test", "unittest"] +) + +# Triggered when message contains "pytest", "test", or "unittest" +``` + +### 2. RepoMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L191`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L191) + +Repository-specific knowledge that's always active when working with a repository. + +**Activation Logic:** +- No triggers required +- Always loaded and active for the repository +- Can define MCP tools configuration + +**Use Cases:** +- Repository-specific guidelines +- Team practices and conventions +- Project-specific workflows +- Custom documentation references + +**Special Files:** +- `.openhands_instructions` - Legacy repo instructions +- `.cursorrules` - Cursor IDE rules (auto-loaded) +- `agents.md` / `agent.md` - Agent instructions (auto-loaded) + +**Example:** +```python +from openhands.sdk.context.microagents import RepoMicroagent + +microagent = RepoMicroagent( + name="project_guidelines", + content="This project uses...", + mcp_tools={"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]}} +) +``` + +### 3. TaskMicroagent + +**Source**: [`openhands/sdk/context/microagents/microagent.py#L236`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L236) + +Specialized KnowledgeMicroagent that requires user input before execution. + +**Activation Logic:** +- Triggered by `/{agent_name}` format +- Prompts user for required inputs +- Processes inputs before injecting knowledge + +**Use Cases:** +- Deployment procedures requiring credentials +- Multi-step workflows with parameters +- Interactive debugging sessions +- Customized task execution + +**Example:** +```python +from openhands.sdk.context.microagents import TaskMicroagent, InputMetadata + +microagent = TaskMicroagent( + name="deploy", + content="Deploy to {environment} with {version}...", + triggers=["/deploy"], + inputs=[ + InputMetadata(name="environment", type="string", required=True), + InputMetadata(name="version", type="string", required=True) + ] +) + +# User types: "/deploy" +# Agent prompts: "Enter environment:" "Enter version:" +# Agent proceeds with filled template +``` + +## How Microagents Work + +```mermaid +sequenceDiagram + participant User + participant Agent + participant Microagent + participant LLM + + User->>Agent: "Run the tests" + Agent->>Agent: Detect keyword "tests" + Agent->>Microagent: Fetch testing microagent + Microagent->>Agent: Return testing guidelines + Agent->>LLM: Context + guidelines + LLM->>Agent: Response with testing knowledge + Agent->>User: Execute tests with guidelines +``` + +## Using Microagents + +### Basic Usage + +```python +from openhands.sdk import Agent, AgentContext + +# Create context with microagents +context = AgentContext( + microagents=["testing_expert", "code_reviewer"] +) + +# Create agent with microagents +agent = Agent( + llm=llm, + tools=tools, + agent_context=context +) +``` + +### Example Integration + +See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py) for a complete example. + +## Microagent Structure + +**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) + +A microagent consists of: +- **Name**: Unique identifier +- **Triggers**: Keywords that activate the microagent +- **Content**: Knowledge or instructions to inject +- **Type**: One of "knowledge", "repo", or "task" + +## Response Models + +**Source**: [`openhands/sdk/context/microagents/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/types.py) + +### MicroagentKnowledge + +```python +class MicroagentKnowledge(BaseModel): + name: str # Microagent name + trigger: str # Keyword that triggered it + content: str # Injected content +``` + +### MicroagentResponse + +```python +class MicroagentResponse(BaseModel): + name: str # Microagent name + path: str # Path or identifier + created_at: datetime # Creation timestamp +``` + +### MicroagentContentResponse + +```python +class MicroagentContentResponse(BaseModel): + content: str # Full microagent content + path: str # Path or identifier + triggers: list[str] # Trigger keywords + git_provider: str | None # Git source if applicable +``` + +## Benefits + +1. **Modularity**: Separate specialized knowledge from main agent logic +2. **Reusability**: Share microagents across multiple agents +3. **Maintainability**: Update expertise without modifying agent code +4. **Context-Aware**: Only inject relevant knowledge when needed +5. **Composability**: Combine multiple microagents for comprehensive coverage + +## Best Practices + +1. **Clear Triggers**: Use specific, unambiguous trigger keywords +2. **Focused Content**: Keep microagent content concise and targeted +3. **Avoid Overlap**: Minimize trigger conflicts between microagents +4. **Version Control**: Store microagents in version-controlled repositories +5. **Documentation**: Document trigger keywords and intended use cases + +## See Also + +- **[Official Microagents Guide](https://docs.all-hands.dev/usage/prompting/microagents-overview)** - Comprehensive documentation +- **[Agent Context](/sdk/architecture/sdk/agent.mdx)** - Using context with agents +- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py)** - Working example diff --git a/sdk/arch/sdk/security.mdx b/sdk/arch/sdk/security.mdx new file mode 100644 index 00000000..a41264fc --- /dev/null +++ b/sdk/arch/sdk/security.mdx @@ -0,0 +1,416 @@ +--- +title: Security +description: Analyze and control agent actions through security analyzers and confirmation policies. +--- + +The security system enables control over agent actions through risk analysis and confirmation policies. It helps prevent dangerous operations while maintaining agent autonomy for safe actions. + +**Source**: [`openhands/sdk/security/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/security) + +## Core Concepts + +```mermaid +graph TD + Action[Agent Action] --> Analyzer[Security Analyzer] + Analyzer --> Risk[Risk Assessment] + Risk --> Policy[Confirmation Policy] + + Policy --> Low{Risk Level} + Low -->|Low| Execute[Execute] + Low -->|Medium| MaybeConfirm[Policy Decision] + Low -->|High| Confirm[Request Confirmation] + + Confirm --> User[User Decision] + User -->|Approve| Execute + User -->|Reject| Block[Block Action] + + style Action fill:#e1f5fe + style Analyzer fill:#fff3e0 + style Policy fill:#e8f5e8 + style Execute fill:#c8e6c9 + style Block fill:#ffcdd2 +``` + +The security system consists of two components: +- **Security Analyzer**: Assesses risk level of actions +- **Confirmation Policy**: Decides when to require user confirmation + +## Security Analyzer + +### LLM Security Analyzer + +**Source**: [`openhands/sdk/security/llm_analyzer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/llm_analyzer.py) + +Uses an LLM to analyze action safety: + +```python +from openhands.sdk.security import LLMSecurityAnalyzer +from openhands.sdk import Agent, LLM +from pydantic import SecretStr + +# Create security analyzer +security_analyzer = LLMSecurityAnalyzer( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ) +) + +# Create agent with security analyzer +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer +) +``` + +### Risk Levels + +**Source**: [`openhands/sdk/security/risk.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/risk.py) + +```python +from openhands.sdk.security.risk import SecurityRisk + +# Risk levels +SecurityRisk.LOW # Safe operations (read files, list directories) +SecurityRisk.MEDIUM # Potentially impactful (write files, API calls) +SecurityRisk.HIGH # Dangerous operations (delete files, system changes) +``` + +### How LLM Analyzer Works + +1. **Action Inspection**: Examines the action and its parameters +2. **Context Analysis**: Considers conversation history and workspace +3. **Risk Assessment**: LLM predicts risk level with reasoning +4. **Risk Return**: Returns risk level and explanation + +```python +# Example internal flow +action = BashAction(command="rm -rf /") +risk = security_analyzer.analyze(action, context) +# Returns: SecurityRisk.HIGH, "Attempting to delete entire filesystem" +``` + +### Custom Security Analyzer + +Implement custom risk analysis: + +```python +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.tool import Action + +class PatternBasedAnalyzer(SecurityAnalyzerBase): + dangerous_patterns = ["rm -rf", "sudo", "DROP TABLE"] + + def analyze( + self, + action: Action, + context: dict + ) -> tuple[SecurityRisk, str]: + command = getattr(action, "command", "") + + for pattern in self.dangerous_patterns: + if pattern in command: + return ( + SecurityRisk.HIGH, + f"Dangerous pattern detected: {pattern}" + ) + + return SecurityRisk.LOW, "No dangerous patterns found" +``` + +## Confirmation Policies + +**Source**: [`openhands/sdk/security/confirmation_policy.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/confirmation_policy.py) + +### Built-in Policies + +#### NeverConfirm + +Never request confirmation (default): + +```python +from openhands.sdk.security import NeverConfirm + +agent = Agent( + llm=llm, + tools=tools, + confirmation_policy=NeverConfirm() +) +``` + +#### AlwaysConfirm + +Always request confirmation: + +```python +from openhands.sdk.security import AlwaysConfirm + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=AlwaysConfirm() +) +``` + +#### ConfirmOnHighRisk + +Confirm only high-risk actions: + +```python +from openhands.sdk.security import ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnHighRisk() +) +``` + +#### ConfirmOnMediumOrHighRisk + +Confirm medium and high-risk actions: + +```python +from openhands.sdk.security import ConfirmOnMediumOrHighRisk + +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnMediumOrHighRisk() +) +``` + +### Custom Confirmation Policy + +Implement custom confirmation logic: + +```python +from openhands.sdk.security.confirmation_policy import ConfirmationPolicyBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.tool import Action + +class TimeBasedPolicy(ConfirmationPolicyBase): + """Require confirmation during business hours.""" + + def should_confirm( + self, + action: Action, + risk: SecurityRisk, + context: dict + ) -> bool: + from datetime import datetime + + hour = datetime.now().hour + + # Business hours: always confirm high risk + if 9 <= hour <= 17: + return risk >= SecurityRisk.HIGH + + # Off hours: confirm medium and high risk + return risk >= SecurityRisk.MEDIUM +``` + +## Using Security System + +### Basic Setup + +```python +from openhands.sdk import Agent, LLM, Conversation +from openhands.sdk.security import ( + LLMSecurityAnalyzer, + ConfirmOnHighRisk +) +from pydantic import SecretStr + +# Create analyzer +security_analyzer = LLMSecurityAnalyzer( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ) +) + +# Create agent with security +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, + confirmation_policy=ConfirmOnHighRisk() +) + +# Use in conversation +conversation = Conversation(agent=agent) +``` + +See [`examples/01_standalone_sdk/04_human_in_the_loop.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py). + +### Handling Confirmations + +```python +from openhands.sdk import Conversation +from openhands.sdk.conversation.state import AgentExecutionStatus + +conversation = Conversation(agent=agent) +conversation.send_message("Delete all temporary files") + +# Run agent +conversation.run() + +# Check if waiting for confirmation +if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + print("Action requires confirmation:") + # Show pending action details + + # User approves + conversation.confirm_pending_action() + conversation.run() + + # Or user rejects + # conversation.reject_pending_action(reason="Too risky") +``` + +### Dynamic Policy Changes + +Change confirmation policy during execution: + +```python +from openhands.sdk.security import AlwaysConfirm, NeverConfirm + +conversation = Conversation(agent=agent) + +# Start with strict policy +conversation.set_confirmation_policy(AlwaysConfirm()) +conversation.send_message("Sensitive task") +conversation.run() + +# Switch to permissive policy +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Safe task") +conversation.run() +``` + +## Security Workflow + +```mermaid +sequenceDiagram + participant Agent + participant Analyzer + participant Policy + participant User + participant Tool + + Agent->>Analyzer: analyze(action) + Analyzer->>Analyzer: Assess risk + Analyzer->>Agent: risk + explanation + + Agent->>Policy: should_confirm(action, risk) + Policy->>Policy: Apply policy rules + + alt No confirmation needed + Policy->>Agent: execute + Agent->>Tool: Execute action + Tool->>Agent: Observation + else Confirmation required + Policy->>User: Request approval + User->>Policy: Approve/Reject + alt Approved + Policy->>Agent: execute + Agent->>Tool: Execute action + else Rejected + Policy->>Agent: block + Agent->>Agent: UserRejectObservation + end + end +``` + +## Best Practices + +1. **Use LLM Analyzer**: Provides nuanced risk assessment +2. **Start Conservative**: Begin with strict policies, relax as needed +3. **Monitor Blocked Actions**: Review what's being blocked +4. **Provide Context**: Better context enables better risk assessment +5. **Test Security Setup**: Verify policies work as expected +6. **Document Policies**: Explain confirmation requirements to users +7. **Handle Rejections**: Implement proper error handling for rejected actions + +## Performance Considerations + +### LLM Analyzer Overhead + +LLM security analysis adds latency: +- **Cost**: Additional LLM call per action +- **Latency**: ~1-2 seconds per analysis +- **Tokens**: ~500-1000 tokens per analysis + +```python +# Only use with confirmation policy +agent = Agent( + llm=llm, + tools=tools, + security_analyzer=security_analyzer, # Costs tokens + confirmation_policy=ConfirmOnHighRisk() # Must be used together +) +``` + +### Optimization Tips + +1. **Cache Similar Actions**: Reuse assessments for similar actions +2. **Use Faster Models**: Consider faster LLMs for security analysis +3. **Pattern-Based Pre-Filter**: Use pattern matching before LLM analysis +4. **Batch Analysis**: Analyze multiple actions together when possible + +## Security Best Practices + +### Principle of Least Privilege + +```python +# Provide only necessary tools +agent = Agent( + llm=llm, + tools=[ + FileEditorTool.create(), # Safe file operations + # Don't include BashTool for untrusted tasks + ] +) +``` + +### Sandbox Execution + +```python +# Use DockerWorkspace for isolation +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +conversation = Conversation(agent=agent, workspace=workspace) +``` + +### Secrets Management + +```python +# Provide secrets securely +conversation = Conversation( + agent=agent, + secrets={ + "API_KEY": "secret-value", + "PASSWORD": "secure-password" + } +) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +## See Also + +- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration with security +- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Handling confirmations +- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool security considerations +- **[Human-in-the-Loop Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py)** - Complete example diff --git a/sdk/arch/sdk/tool.mdx b/sdk/arch/sdk/tool.mdx new file mode 100644 index 00000000..3bbe737e --- /dev/null +++ b/sdk/arch/sdk/tool.mdx @@ -0,0 +1,199 @@ +--- +title: Tool System +description: Define custom tools for agents to interact with external systems through typed action/observation patterns. +--- + +The tool system enables agents to interact with external systems and perform actions. Tools follow a typed action/observation pattern with comprehensive validation and schema generation. + +## Core Concepts + +```mermaid +graph LR + Action[Action] --> Tool[Tool] + Tool --> Executor[ToolExecutor] + Executor --> Observation[Observation] + + style Action fill:#e1f5fe + style Tool fill:#f3e5f5 + style Executor fill:#fff3e0 + style Observation fill:#e8f5e8 +``` + +A tool consists of three components: +- **Action**: Input schema defining tool parameters +- **ToolExecutor**: Logic that executes the tool +- **Observation**: Output schema with execution results + +**Source**: [`openhands/sdk/tool/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool) + +## Defining Custom Tools + +### 1. Define Action and Observation + +**Source**: [`openhands/sdk/tool/schema.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/schema.py) + +```python +from openhands.sdk.tool import Action, Observation + +class CalculateAction(Action): + """Action to perform calculation.""" + expression: str + precision: int = 2 + +class CalculateObservation(Observation): + """Result of calculation.""" + result: float + success: bool +``` + +### 2. Implement ToolExecutor + +**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) + +```python +from openhands.sdk.tool import ToolExecutor + +class CalculateExecutor(ToolExecutor[CalculateAction, CalculateObservation]): + def __call__(self, action: CalculateAction) -> CalculateObservation: + try: + result = eval(action.expression) + return CalculateObservation( + result=round(result, action.precision), + success=True + ) + except Exception as e: + return CalculateObservation( + result=0.0, + success=False, + error=str(e) + ) +``` + +### 3. Create Tool Class + +```python +from openhands.sdk.tool import Tool + +class CalculateTool(Tool[CalculateAction, CalculateObservation]): + name: str = "calculate" + description: str = "Evaluate mathematical expressions" + action_type: type[Action] = CalculateAction + observation_type: type[Observation] = CalculateObservation + + @classmethod + def create(cls) -> list["CalculateTool"]: + executor = CalculateExecutor() + return [cls().set_executor(executor)] +``` + +### Complete Example + +See [`examples/01_standalone_sdk/02_custom_tools.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) for a working example. + +## Built-in Tools + +**Source**: [`openhands/sdk/tool/builtins/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool/builtins) + +### FinishTool + +**Source**: [`openhands/sdk/tool/builtins/finish.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/finish.py) + +Signals task completion with optional output. + +```python +from openhands.sdk.tool.builtins import FinishTool + +# Automatically included with agents +finish_tool = FinishTool.create() +``` + +### ThinkTool + +**Source**: [`openhands/sdk/tool/builtins/think.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/think.py) + +Enables internal reasoning without external actions. + +```python +from openhands.sdk.tool.builtins import ThinkTool + +# Automatically included with agents +think_tool = ThinkTool.create() +``` + +## Tool Annotations + +**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) + +Provide hints about tool behavior following [MCP spec](https://modelcontextprotocol.io/): + +```python +from openhands.sdk.tool import ToolAnnotations + +annotations = ToolAnnotations( + title="Calculate", + readOnlyHint=True, # Tool doesn't modify environment + destructiveHint=False, # Tool doesn't perform destructive updates + idempotentHint=True, # Same input produces same output + openWorldHint=False # Tool doesn't interact with external entities +) + +class CalculateTool(Tool[CalculateAction, CalculateObservation]): + annotations: ToolAnnotations = annotations + # ... rest of tool definition +``` + +## Tool Registry + +**Source**: [`openhands/sdk/tool/registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/registry.py) + +Tools are automatically registered when defined. The registry manages tool discovery and schema generation for LLM function calling. + +## Advanced Patterns + +### Stateful Executors + +Executors can maintain state across executions: + +```python +class DatabaseExecutor(ToolExecutor[QueryAction, QueryObservation]): + def __init__(self, connection_string: str): + self.connection = connect(connection_string) + + def __call__(self, action: QueryAction) -> QueryObservation: + result = self.connection.execute(action.query) + return QueryObservation(rows=result.fetchall()) + + def close(self) -> None: + """Clean up resources.""" + self.connection.close() +``` + +### Dynamic Tool Creation + +Create tools with runtime configuration: + +```python +class ConfigurableTool(Tool[MyAction, MyObservation]): + @classmethod + def create(cls, api_key: str, endpoint: str) -> list["ConfigurableTool"]: + executor = MyExecutor(api_key=api_key, endpoint=endpoint) + return [cls().set_executor(executor)] + +# Use with different configurations +tool1 = ConfigurableTool.create(api_key="key1", endpoint="https://api1.com") +tool2 = ConfigurableTool.create(api_key="key2", endpoint="https://api2.com") +``` + +## Best Practices + +1. **Type Safety**: Use Pydantic models for actions and observations +2. **Error Handling**: Always handle exceptions in executors +3. **Resource Management**: Implement `close()` for cleanup +4. **Clear Descriptions**: Provide detailed docstrings for LLM understanding +5. **Validation**: Leverage Pydantic validators for input validation + +## See Also + +- **[Pre-defined Tools](/sdk/architecture/tools/)** - Ready-to-use tool implementations +- **[MCP Integration](/sdk/architecture/sdk/mcp.mdx)** - Connect to external MCP tools +- **[Agent Usage](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents diff --git a/sdk/arch/sdk/workspace.mdx b/sdk/arch/sdk/workspace.mdx new file mode 100644 index 00000000..42d61900 --- /dev/null +++ b/sdk/arch/sdk/workspace.mdx @@ -0,0 +1,322 @@ +--- +title: Workspace Interface +description: Abstract interface for agent execution environments supporting local and remote operations. +--- + +The workspace interface defines how agents interact with their execution environment. It provides a unified API for file operations and command execution, supporting both local and remote environments. + +**Source**: [`openhands/sdk/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace) + +## Core Concepts + +```mermaid +graph TD + BaseWorkspace[BaseWorkspace] --> Local[LocalWorkspace] + BaseWorkspace --> Remote[RemoteWorkspace] + + Local --> FileOps[File Operations] + Local --> CmdExec[Command Execution] + + Remote --> Docker[DockerWorkspace] + Remote --> API[RemoteAPIWorkspace] + + style BaseWorkspace fill:#e1f5fe + style Local fill:#e8f5e8 + style Remote fill:#fff3e0 +``` + +A workspace provides: +- **File Operations**: Upload, download, read, write +- **Command Execution**: Run bash commands with timeout support +- **Resource Management**: Context manager protocol for cleanup +- **Flexibility**: Local development or remote sandboxed execution + +## Base Interface + +**Source**: [`openhands/sdk/workspace/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/base.py) + +### BaseWorkspace + +Abstract base class defining the workspace interface: + +```python +from openhands.sdk.workspace import BaseWorkspace + +class CustomWorkspace(BaseWorkspace): + working_dir: str # Required: working directory path + + def execute_command( + self, + command: str, + cwd: str | None = None, + timeout: float = 30.0 + ) -> CommandResult: + """Execute bash command.""" + ... + + def file_upload( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + """Upload file to workspace.""" + ... + + def file_download( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + """Download file from workspace.""" + ... +``` + +### Context Manager Protocol + +All workspaces support the context manager protocol for safe resource management: + +```python +with workspace: + result = workspace.execute_command("echo 'hello'") + # Workspace automatically cleans up on exit +``` + +## LocalWorkspace + +**Source**: [`openhands/sdk/workspace/local.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/local.py) + +Executes operations directly on the local machine. + +```python +from openhands.sdk.workspace import LocalWorkspace + +workspace = LocalWorkspace(working_dir="/path/to/project") + +# Execute command +result = workspace.execute_command("ls -la") +print(result.stdout) + +# Upload file (copy) +workspace.file_upload("local_file.txt", "workspace_file.txt") + +# Download file (copy) +workspace.file_download("workspace_file.txt", "local_copy.txt") +``` + +**Use Cases**: +- Local development and testing +- Direct file system access +- No sandboxing required +- Fast execution without network overhead + +## RemoteWorkspace + +**Source**: [`openhands/sdk/workspace/remote/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace/remote) + +Abstract base for remote execution environments. + +### RemoteWorkspace Mixin + +**Source**: [`openhands/sdk/workspace/remote/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/base.py) + +Provides common functionality for remote workspaces: +- Network communication +- File transfer protocols +- Command execution over API +- Resource cleanup + +### AsyncRemoteWorkspace + +**Source**: [`openhands/sdk/workspace/remote/async_remote_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/async_remote_workspace.py) + +Async version for concurrent operations. + +## Concrete Remote Implementations + +Remote workspace implementations are provided in the `workspace` package: + +### DockerWorkspace + +**Source**: See [workspace/docker documentation](/sdk/architecture/workspace/docker.mdx) + +Executes operations in an isolated Docker container. + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04", + container_name="agent-sandbox" +) + +with workspace: + result = workspace.execute_command("python script.py") +``` + +**Benefits**: +- Strong isolation and sandboxing +- Reproducible environments +- Resource limits and security +- Clean slate for each session + +### RemoteAPIWorkspace + +**Source**: See [workspace/remote_api documentation](/sdk/architecture/workspace/remote_api.mdx) + +Connects to a remote agent server via API. + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("npm test") +``` + +**Benefits**: +- Centralized agent execution +- Shared resources and caching +- Scalable architecture +- Remote monitoring and logging + +## Result Models + +**Source**: [`openhands/sdk/workspace/models.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/models.py) + +### CommandResult + +```python +class CommandResult(BaseModel): + stdout: str # Standard output + stderr: str # Standard error + exit_code: int # Exit code (0 = success) + duration: float # Execution time in seconds +``` + +### FileOperationResult + +```python +class FileOperationResult(BaseModel): + success: bool # Operation success status + message: str # Status message + path: str # File path +``` + +## Usage with Agents + +Workspaces integrate with agents through tools: + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from openhands.sdk.workspace import LocalWorkspace + +# Create workspace +workspace = LocalWorkspace(working_dir="/project") + +# Create tools with workspace +tools = [ + BashTool.create(working_dir=workspace.working_dir), + FileEditorTool.create() +] + +# Create agent +agent = Agent(llm=llm, tools=tools) +``` + +## Local vs Remote Comparison + +| Feature | LocalWorkspace | RemoteWorkspace | +|---------|---------------|-----------------| +| **Execution** | Local machine | Remote server/container | +| **Isolation** | None | Strong (Docker/API) | +| **Performance** | Fast | Network latency | +| **Security** | Host system | Sandboxed environment | +| **Setup** | Simple | Requires infrastructure | +| **Use Case** | Development | Production/Multi-user | + +## Advanced Usage + +### Custom Workspace Implementation + +```python +from openhands.sdk.workspace import BaseWorkspace +from openhands.sdk.workspace.models import CommandResult, FileOperationResult + +class CloudWorkspace(BaseWorkspace): + working_dir: str + cloud_instance_id: str + + def execute_command( + self, + command: str, + cwd: str | None = None, + timeout: float = 30.0 + ) -> CommandResult: + # Execute on cloud instance + response = self.cloud_api.run_command( + instance_id=self.cloud_instance_id, + command=command + ) + return CommandResult( + stdout=response.stdout, + stderr=response.stderr, + exit_code=response.exit_code, + duration=response.duration + ) + + def file_upload( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + # Upload to cloud storage + ... + + def file_download( + self, + source_path: str, + destination_path: str + ) -> FileOperationResult: + # Download from cloud storage + ... +``` + +### Error Handling + +```python +from openhands.sdk.workspace import LocalWorkspace + +workspace = LocalWorkspace(working_dir="/project") + +try: + result = workspace.execute_command("risky_command", timeout=60.0) + if result.exit_code != 0: + print(f"Command failed: {result.stderr}") +except TimeoutError: + print("Command timed out") +except Exception as e: + print(f"Execution error: {e}") +``` + +## Best Practices + +1. **Use Context Managers**: Always use `with` statements for proper cleanup +2. **Set Appropriate Timeouts**: Prevent hanging on long-running commands +3. **Validate Working Directory**: Ensure paths exist before operations +4. **Handle Errors**: Check exit codes and handle exceptions +5. **Choose Right Workspace**: Local for development, remote for production +6. **Resource Limits**: Set appropriate resource limits for remote workspaces + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based sandboxing +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Remote agent execution server +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace usage examples diff --git a/sdk/arch/tools/bash.mdx b/sdk/arch/tools/bash.mdx new file mode 100644 index 00000000..3497307c --- /dev/null +++ b/sdk/arch/tools/bash.mdx @@ -0,0 +1,288 @@ +--- +title: BashTool +description: Execute bash commands with persistent session support, timeout control, and environment management. +--- + +BashTool enables agents to execute bash commands in a persistent session with full control over working directory, environment variables, and execution timeout. + +**Source**: [`openhands/tools/execute_bash/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash) + +## Overview + +BashTool provides: +- Persistent bash session across multiple commands +- Environment variable management +- Timeout control for long-running commands +- Working directory configuration +- Support for both local and remote execution + +## Usage + +### Basic Usage + +```python +from openhands.tools import BashTool + +# Create tool +bash_tool = BashTool.create() + +# Use with agent +from openhands.sdk import Agent + +agent = Agent( + llm=llm, + tools=[bash_tool] +) +``` + +### With Configuration + +```python +bash_tool = BashTool.create( + working_dir="/project/path", + timeout=60.0 # 60 seconds +) +``` + +## Action Model + +**Source**: [`openhands/tools/execute_bash/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/definition.py) + +```python +class BashAction(Action): + command: str # Bash command to execute + thought: str = "" # Optional reasoning +``` + +### Example + +```python +from openhands.tools import BashAction + +action = BashAction( + command="ls -la", + thought="List files to understand directory structure" +) +``` + +## Observation Model + +```python +class BashObservation(Observation): + output: str # Command output (stdout + stderr) + exit_code: int # Exit code (0 = success) +``` + +### Example + +```python +# Successful execution +observation = BashObservation( + output="file1.txt\nfile2.py\n", + exit_code=0 +) + +# Failed execution +observation = BashObservation( + output="command not found: invalid_cmd\n", + exit_code=127 +) +``` + +## Features + +### Persistent Session + +Commands execute in the same bash session, preserving: +- Environment variables +- Working directory changes +- Shell state + +```python +# Set environment variable +agent.run("export API_KEY=secret") + +# Use in next command +agent.run("echo $API_KEY") # Outputs: secret +``` + +### Terminal Types + +**Source**: [`openhands/tools/execute_bash/terminal/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash/terminal) + +BashTool supports multiple terminal implementations: + +- **SubprocessTerminal**: Direct subprocess execution (default) +- **TmuxTerminal**: Tmux-based persistent sessions + +### Timeout Control + +Commands automatically timeout after the specified duration: + +```python +bash_tool = BashTool.create(timeout=30.0) # 30 second timeout + +# Long-running command will be terminated +action = BashAction(command="sleep 60") # Timeout after 30s +``` + +### Environment Management + +Set custom environment variables: + +```python +# Via workspace secrets +from openhands.sdk import Conversation + +conversation = Conversation( + agent=agent, + secrets={ + "DATABASE_URL": "postgres://...", + "API_KEY": "secret" + } +) +``` + +See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). + +## Common Use Cases + +### File Operations + +```python +# Create directory +BashAction(command="mkdir -p /path/to/dir") + +# Copy files +BashAction(command="cp source.txt dest.txt") + +# Find files +BashAction(command="find . -name '*.py'") +``` + +### Build and Test + +```python +# Install dependencies +BashAction(command="pip install -r requirements.txt") + +# Run tests +BashAction(command="pytest tests/") + +# Build project +BashAction(command="npm run build") +``` + +### Git Operations + +```python +# Clone repository +BashAction(command="git clone https://github.com/user/repo.git") + +# Create branch +BashAction(command="git checkout -b feature-branch") + +# Commit changes +BashAction(command='git commit -m "Add feature"') +``` + +### System Information + +```python +# Check disk space +BashAction(command="df -h") + +# List processes +BashAction(command="ps aux") + +# Network information +BashAction(command="ifconfig") +``` + +## Best Practices + +1. **Set Appropriate Timeouts**: Prevent hanging on long commands +2. **Use Absolute Paths**: Or configure working directory explicitly +3. **Check Exit Codes**: Verify command success in agent logic +4. **Escape Special Characters**: Properly quote arguments +5. **Avoid Interactive Commands**: BashTool works best with non-interactive commands +6. **Use Security Analysis**: Enable for sensitive operations + +## Security Considerations + +### Risk Assessment + +BashTool actions have varying risk levels: + +- **LOW**: Read operations (`ls`, `cat`, `grep`) +- **MEDIUM**: Write operations (`touch`, `mkdir`, `echo >`) +- **HIGH**: Destructive operations (`rm -rf`, `sudo`, `chmod`) + +### Enable Security + +```python +from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=[BashTool.create()], + security_analyzer=LLMSecurityAnalyzer(llm=llm), + confirmation_policy=ConfirmOnHighRisk() +) +``` + +### Sandboxing + +Use DockerWorkspace for isolation: + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +conversation = Conversation(agent=agent, workspace=workspace) +``` + +## Error Handling + +### Common Exit Codes + +- `0`: Success +- `1`: General error +- `2`: Misuse of shell builtin +- `126`: Command not executable +- `127`: Command not found +- `130`: Terminated by Ctrl+C +- `137`: Killed by SIGKILL (timeout) + +### Handling Failures + +```python +# Agent can check observation +if observation.exit_code != 0: + # Handle error based on output + if "permission denied" in observation.output.lower(): + # Retry with different approach + pass +``` + +## Implementation Details + +**Source**: [`openhands/tools/execute_bash/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/impl.py) + +The tool uses a terminal interface that: +1. Initializes a persistent bash session +2. Executes commands with timeout support +3. Captures stdout and stderr +4. Returns exit codes +5. Handles session cleanup + +## See Also + +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For file manipulation +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/browser_use.mdx b/sdk/arch/tools/browser_use.mdx new file mode 100644 index 00000000..bd52db73 --- /dev/null +++ b/sdk/arch/tools/browser_use.mdx @@ -0,0 +1,101 @@ +--- +title: BrowserUseTool +description: Web browsing and interaction capabilities powered by browser-use integration. +--- + +BrowserUseTool enables agents to interact with web pages, navigate websites, and extract web content through an integrated browser. + +**Source**: [`openhands/tools/browser_use/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/browser_use) + +## Overview + +BrowserUseTool provides: +- Web page navigation +- Element interaction (click, type, etc.) +- Content extraction +- Screenshot capture +- JavaScript execution + +## Usage + +```python +from openhands.tools import BrowserUseTool + +agent = Agent(llm=llm, tools=[BrowserUseTool.create()]) +``` + +## Features + +### Web Navigation + +- Navigate to URLs +- Follow links +- Browser back/forward +- Page refresh + +### Element Interaction + +- Click elements +- Fill forms +- Submit data +- Select dropdowns + +### Content Extraction + +- Extract text content +- Get element attributes +- Capture screenshots +- Parse structured data + +## Use Cases + +### Web Scraping + +```python +# Navigate to page and extract data +# Agent can use browser to: +# 1. Navigate to target URL +# 2. Wait for content to load +# 3. Extract desired information +# 4. Return structured data +``` + +### Web Testing + +```python +# Test web applications +# Agent can: +# 1. Navigate to application +# 2. Fill out forms +# 3. Click buttons +# 4. Verify expected behavior +``` + +### Research + +```python +# Research information online +# Agent can: +# 1. Search for information +# 2. Navigate search results +# 3. Extract relevant content +# 4. Synthesize findings +``` + +## Integration + +BrowserUseTool is powered by the [browser-use](https://github.com/browser-use/browser-use) library, providing robust web automation capabilities. + +## Best Practices + +1. **Handle Loading**: Wait for page content to load +2. **Error Handling**: Handle navigation and interaction failures +3. **Rate Limiting**: Be respectful of target websites +4. **Security**: Avoid sensitive operations in browser +5. **Timeouts**: Set appropriate timeouts for operations + +## See Also + +- **[browser-use](https://github.com/browser-use/browser-use)** - Underlying browser automation library +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For local command execution +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For processing extracted content diff --git a/sdk/arch/tools/file_editor.mdx b/sdk/arch/tools/file_editor.mdx new file mode 100644 index 00000000..fff65d25 --- /dev/null +++ b/sdk/arch/tools/file_editor.mdx @@ -0,0 +1,338 @@ +--- +title: FileEditorTool +description: Edit files with diff-based operations, undo support, and intelligent line-based modifications. +--- + +FileEditorTool provides powerful file editing capabilities with diff-based operations, undo/redo support, and intelligent line-based modifications. It's designed for precise code and text file manipulation. + +**Source**: [`openhands/tools/file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/file_editor) + +## Overview + +FileEditorTool provides: +- View file contents with line numbers +- Insert, delete, and replace lines +- String-based find-and-replace +- Undo/redo support +- Automatic diff generation +- File history tracking + +## Usage + +```python +from openhands.tools import FileEditorTool + +# Create tool +file_editor = FileEditorTool.create() + +# Use with agent +agent = Agent(llm=llm, tools=[file_editor]) +``` + +## Available Commands + +**Source**: [`openhands/tools/file_editor/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/definition.py) + +### view +View file contents with line numbers. + +```python +FileEditAction( + command="view", + path="script.py" +) +``` + +Optional parameters: +- `view_range=[start, end]`: View specific line range + +### create +Create a new file with content. + +```python +FileEditAction( + command="create", + path="newfile.py", + file_text="print('Hello, World!')\n" +) +``` + +### str_replace +Replace a string in the file. + +```python +FileEditAction( + command="str_replace", + path="script.py", + old_str="old_function()", + new_str="new_function()" +) +``` + +### insert +Insert text after a specific line. + +```python +FileEditAction( + command="insert", + path="script.py", + insert_line=10, + new_str=" # New code here\n" +) +``` + +### undo_edit +Undo the last edit operation. + +```python +FileEditAction( + command="undo_edit", + path="script.py" +) +``` + +## Action Model + +```python +class FileEditAction(Action): + command: Literal["view", "create", "str_replace", "insert", "undo_edit"] + path: str # File path + file_text: str | None = None # For create + old_str: str | None = None # For str_replace + new_str: str | None = None # For str_replace/insert + insert_line: int | None = None # For insert + view_range: list[int] | None = None # For view +``` + +## Observation Model + +```python +class FileEditObservation(Observation): + content: str # Result message or file content + success: bool # Operation success status + diff: str | None = None # Unified diff for changes +``` + +## Features + +### Diff Generation + +Automatic diff generation for all modifications: + +```python +# After edit +observation = FileEditObservation( + content="File edited successfully", + success=True, + diff=""" +--- script.py ++++ script.py +@@ -1,3 +1,3 @@ + def main(): +- print("old") ++ print("new") +""" +) +``` + +### Edit History + +Track file modification history with undo support: + +```python +# Edit file +action1 = FileEditAction(command="str_replace", path="file.py", ...) + +# Make another edit +action2 = FileEditAction(command="insert", path="file.py", ...) + +# Undo last edit +action3 = FileEditAction(command="undo_edit", path="file.py") +``` + +### Line-Based Operations + +All operations work with line numbers for precision: + +```python +# View specific lines +FileEditAction( + command="view", + path="large_file.py", + view_range=[100, 150] # View lines 100-150 +) + +# Insert at specific line +FileEditAction( + command="insert", + path="script.py", + insert_line=25, + new_str=" new_code()\n" +) +``` + +### String Replacement + +Find and replace with exact matching: + +```python +# Must match exactly including whitespace +FileEditAction( + command="str_replace", + path="config.py", + old_str="DEBUG = False\nLOG_LEVEL = 'INFO'", + new_str="DEBUG = True\nLOG_LEVEL = 'DEBUG'" +) +``` + +## Common Use Cases + +### Creating Files + +```python +# Create Python script +FileEditAction( + command="create", + path="hello.py", + file_text="#!/usr/bin/env python3\nprint('Hello, World!')\n" +) + +# Create configuration file +FileEditAction( + command="create", + path="config.json", + file_text='{"setting": "value"}\n' +) +``` + +### Viewing Files + +```python +# View entire file +FileEditAction(command="view", path="README.md") + +# View specific section +FileEditAction( + command="view", + path="large_file.py", + view_range=[1, 50] +) + +# View end of file +FileEditAction( + command="view", + path="log.txt", + view_range=[-20, -1] # Last 20 lines +) +``` + +### Refactoring Code + +```python +# Rename function +FileEditAction( + command="str_replace", + path="module.py", + old_str="def old_name(arg):", + new_str="def new_name(arg):" +) + +# Add import +FileEditAction( + command="insert", + path="script.py", + insert_line=0, + new_str="import numpy as np\n" +) + +# Fix bug +FileEditAction( + command="str_replace", + path="buggy.py", + old_str=" if x = 5:", + new_str=" if x == 5:" +) +``` + +## Best Practices + +1. **View Before Editing**: Always view file content first +2. **Exact String Matching**: Ensure `old_str` matches exactly +3. **Include Context**: Include surrounding lines for uniqueness +4. **Use Line Numbers**: View with line numbers for precise edits +5. **Check Success**: Verify `observation.success` before proceeding +6. **Review Diffs**: Check generated diffs for accuracy +7. **Use Undo Sparingly**: Undo only when necessary + +## Error Handling + +### Common Errors + +```python +# File not found +FileEditObservation( + content="Error: File 'missing.py' not found", + success=False +) + +# String not found +FileEditObservation( + content="Error: old_str not found in file", + success=False +) + +# Multiple matches +FileEditObservation( + content="Error: old_str matched multiple locations", + success=False +) + +# Invalid line number +FileEditObservation( + content="Error: insert_line out of range", + success=False +) +``` + +### Recovery Strategies + +```python +# If string not found, view file first +if not observation.success and "not found" in observation.content: + # View file to understand current content + view_action = FileEditAction(command="view", path=path) +``` + +## Implementation Details + +**Source**: [`openhands/tools/file_editor/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/impl.py) + +The editor maintains: +- **File Cache**: Efficient file content caching +- **Edit History**: Per-file undo stack +- **Diff Engine**: Unified diff generation +- **Encoding Detection**: Automatic encoding handling + +## Configuration + +**Source**: [`openhands/tools/file_editor/utils/config.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/utils/config.py) + +```python +# Constants +MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB +MAX_HISTORY_SIZE = 100 # Max undo operations +``` + +## Security Considerations + +- File operations are restricted to working directory +- No execution of file content +- Safe for user-generated content +- Automatic encoding detection prevents binary file issues + +## See Also + +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For file system operations +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/glob.mdx b/sdk/arch/tools/glob.mdx new file mode 100644 index 00000000..8983d0af --- /dev/null +++ b/sdk/arch/tools/glob.mdx @@ -0,0 +1,89 @@ +--- +title: GlobTool +description: Find files using glob patterns with recursive search and flexible matching. +--- + +GlobTool enables file discovery using glob patterns, supporting recursive search, wildcards, and flexible path matching. + +**Source**: [`openhands/tools/glob/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/glob) + +## Usage + +```python +from openhands.tools import GlobTool + +agent = Agent(llm=llm, tools=[GlobTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/glob/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/glob/definition.py) + +```python +class GlobAction(Action): + pattern: str # Glob pattern (e.g., "**/*.py") +``` + +## Observation Model + +```python +class GlobObservation(Observation): + paths: list[str] # List of matching file paths +``` + +## Pattern Syntax + +- `*`: Match any characters except `/` +- `**`: Match any characters including `/` (recursive) +- `?`: Match single character +- `[abc]`: Match any character in brackets +- `[!abc]`: Match any character not in brackets + +## Examples + +### Find Python Files + +```python +GlobAction(pattern="**/*.py") +# Returns: ["src/main.py", "tests/test_main.py", ...] +``` + +### Find Specific Files + +```python +GlobAction(pattern="**/test_*.py") +# Returns: ["tests/test_api.py", "tests/test_utils.py", ...] +``` + +### Multiple Extensions + +```python +GlobAction(pattern="**/*.{py,js,ts}") +# Returns: ["script.py", "app.js", "types.ts", ...] +``` + +### Current Directory Only + +```python +GlobAction(pattern="*.txt") +# Returns: ["readme.txt", "notes.txt", ...] +``` + +## Common Use Cases + +- **Code Discovery**: `**/*.py` - Find all Python files +- **Test Files**: `**/test_*.py` - Find test files +- **Configuration**: `**/*.{json,yaml,yml}` - Find config files +- **Documentation**: `**/*.md` - Find markdown files + +## Best Practices + +1. **Use Recursive Patterns**: `**/*` for deep searches +2. **Specific Extensions**: Narrow results with extensions +3. **Combine with GrepTool**: Find files, then search content +4. **Check Results**: Handle empty result lists + +## See Also + +- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative file operations diff --git a/sdk/arch/tools/grep.mdx b/sdk/arch/tools/grep.mdx new file mode 100644 index 00000000..bd879318 --- /dev/null +++ b/sdk/arch/tools/grep.mdx @@ -0,0 +1,140 @@ +--- +title: GrepTool +description: Search file contents using regex patterns with context and match highlighting. +--- + +GrepTool enables content search across files using regex patterns, providing context around matches and detailed results. + +**Source**: [`openhands/tools/grep/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/grep) + +## Usage + +```python +from openhands.tools import GrepTool + +agent = Agent(llm=llm, tools=[GrepTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/grep/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/grep/definition.py) + +```python +class GrepAction(Action): + pattern: str # Regex pattern to search + path: str = "." # Directory or file to search + case_sensitive: bool = False # Case sensitivity +``` + +## Observation Model + +```python +class GrepObservation(Observation): + matches: list[dict] # List of matches with context + # Each match contains: + # - file: str - File path + # - line: int - Line number + # - content: str - Matching line +``` + +## Examples + +### Search for Function Definition + +```python +GrepAction( + pattern=r"def\s+\w+\(", + path="src/", + case_sensitive=False +) +# Returns: [ +# {"file": "src/main.py", "line": 10, "content": "def process_data(x):"}, +# ... +# ] +``` + +### Case-Sensitive Search + +```python +GrepAction( + pattern="TODO", + path=".", + case_sensitive=True +) +# Only matches exact case "TODO" +``` + +### Search Specific File + +```python +GrepAction( + pattern="import.*pandas", + path="script.py" +) +``` + +## Pattern Syntax + +Supports Python regex patterns: +- `.`: Any character +- `*`: Zero or more +- `+`: One or more +- `?`: Optional +- `[]`: Character class +- `()`: Group +- `|`: Alternation +- `^`: Line start +- `$`: Line end + +## Common Use Cases + +### Find TODOs + +```python +GrepAction(pattern=r"TODO|FIXME|XXX", path=".") +``` + +### Find Imports + +```python +GrepAction(pattern=r"^import |^from .* import ", path="src/") +``` + +### Find API Keys (for security review) + +```python +GrepAction(pattern=r"api[_-]key|secret|password", path=".") +``` + +### Find Function Calls + +```python +GrepAction(pattern=r"database\.query\(", path=".") +``` + +## Best Practices + +1. **Escape Special Characters**: Use `\` for regex special chars +2. **Use Anchors**: `^` and `$` for line boundaries +3. **Case Insensitive Default**: Unless exact case matters +4. **Narrow Search Paths**: Search specific directories +5. **Combine with GlobTool**: Find files first, then grep + +## Workflow Pattern + +```python +# 1. Find relevant files +glob_action = GlobAction(pattern="**/*.py") + +# 2. Search content in those files +grep_action = GrepAction( + pattern="class.*Exception", + path="src/" +) +``` + +## See Also + +- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - View/edit files +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative with `grep` command diff --git a/sdk/arch/tools/overview.mdx b/sdk/arch/tools/overview.mdx new file mode 100644 index 00000000..aadf3f01 --- /dev/null +++ b/sdk/arch/tools/overview.mdx @@ -0,0 +1,185 @@ +--- +title: Tools Overview +description: Pre-built tools for common agent operations including bash execution, file editing, and code search. +--- + +The `openhands.tools` package provides a collection of pre-built, production-ready tools for common agent operations. These tools enable agents to interact with files, execute commands, search code, and manage tasks. + +**Source**: [`openhands/tools/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools) + +## Available Tools + +### Core Tools + +- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Execute bash commands with timeout and environment support +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Edit files with diff-based operations and undo support +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning workflows + +### Search Tools + +- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files using glob patterns +- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents with regex support + +### Specialized Tools + +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Track and manage agent tasks +- **[BrowserUseTool](/sdk/architecture/tools/browser_use.mdx)** - Web browsing and interaction + +## Quick Start + +### Using Individual Tools + +```python +from openhands.sdk import Agent, LLM +from openhands.tools import BashTool, FileEditorTool +from pydantic import SecretStr + +agent = Agent( + llm=LLM( + model="anthropic/claude-sonnet-4-20250514", + api_key=SecretStr("your-api-key") + ), + tools=[ + BashTool.create(), + FileEditorTool.create() + ] +) +``` + +### Using Tool Presets + +```python +from openhands.tools.preset import get_default_tools, get_planning_tools + +# Default toolset for general tasks +default_tools = get_default_tools() + +# Specialized toolset for planning workflows +planning_tools = get_planning_tools() + +agent = Agent(llm=llm, tools=default_tools) +``` + +## Tool Structure + +All tools follow a consistent structure: + +```mermaid +graph TD + Tool[Tool Definition] --> Action[Action Model] + Tool --> Observation[Observation Model] + Tool --> Executor[Executor Implementation] + + Action --> Params[Input Parameters] + Observation --> Result[Output Data] + Executor --> Execute[execute() method] + + style Tool fill:#e1f5fe + style Action fill:#fff3e0 + style Observation fill:#e8f5e8 + style Executor fill:#f3e5f5 +``` + +### Tool Components + +1. **Action**: Input model defining tool parameters +2. **Observation**: Output model containing execution results +3. **Executor**: Implementation that executes the tool logic + +## Tool Presets + +**Source**: [`openhands/tools/preset/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/preset) + +### Default Preset + +**Source**: [`openhands/tools/preset/default.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/default.py) + +General-purpose toolset for most tasks: + +```python +from openhands.tools.preset import get_default_tools + +tools = get_default_tools() +# Includes: BashTool, FileEditorTool, GlobTool, GrepTool +``` + +### Planning Preset + +**Source**: [`openhands/tools/preset/planning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py) + +Optimized for planning and multi-file workflows: + +```python +from openhands.tools.preset import get_planning_tools + +tools = get_planning_tools() +# Includes: BashTool, PlanningFileEditorTool, GlobTool, GrepTool, TaskTrackerTool +``` + +## Creating Custom Tools + +See the [Tool Definition Guide](/sdk/architecture/sdk/tool.mdx) for creating custom tools. + +## Tool Security + +Tools support security risk assessment: + +```python +from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk + +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()], + security_analyzer=LLMSecurityAnalyzer(llm=llm), + confirmation_policy=ConfirmOnHighRisk() +) +``` + +See [Security Documentation](/sdk/architecture/sdk/security.mdx) for more details. + +## Tool Configuration + +### Working Directory + +Most tools operate relative to a working directory: + +```python +from openhands.tools import BashTool + +bash_tool = BashTool.create(working_dir="/project/path") +``` + +### Timeout Settings + +Configure execution timeouts: + +```python +from openhands.tools import BashTool + +bash_tool = BashTool.create(timeout=60.0) # 60 seconds +``` + +## Best Practices + +1. **Use Presets**: Start with tool presets for common workflows +2. **Configure Timeouts**: Set appropriate timeouts for tools +3. **Provide Context**: Use working directories effectively +4. **Enable Security**: Add security analysis for sensitive operations +5. **Filter Tools**: Use `filter_tools_regex` to limit available tools +6. **Test Locally**: Verify tools work in your environment + +## Tool Examples + +Each tool has comprehensive examples: + +- **[Bash Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Command execution +- **[File Editor Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - File manipulation +- **[Planning Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Planning workflows +- **[Task Tracker Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Task management + +## See Also + +- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools +- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents +- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Complete examples diff --git a/sdk/arch/tools/planning_file_editor.mdx b/sdk/arch/tools/planning_file_editor.mdx new file mode 100644 index 00000000..e176c93b --- /dev/null +++ b/sdk/arch/tools/planning_file_editor.mdx @@ -0,0 +1,128 @@ +--- +title: PlanningFileEditorTool +description: Multi-file editing tool optimized for planning workflows with batch operations. +--- + +PlanningFileEditorTool extends FileEditorTool with multi-file editing capabilities optimized for planning agent workflows. + +**Source**: [`openhands/tools/planning_file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/planning_file_editor) + +## Overview + +PlanningFileEditorTool provides: +- All FileEditorTool capabilities +- Optimized for planning workflows +- Batch file operations +- Coordination with TaskTrackerTool + +## Usage + +```python +from openhands.tools import PlanningFileEditorTool + +agent = Agent(llm=llm, tools=[PlanningFileEditorTool.create()]) +``` + +## Relation to FileEditorTool + +PlanningFileEditorTool inherits all FileEditorTool commands: +- `view`: View file contents +- `create`: Create new files +- `str_replace`: Replace strings +- `insert`: Insert lines +- `undo_edit`: Undo changes + +See [FileEditorTool](/sdk/architecture/tools/file_editor.mdx) for detailed command documentation. + +## Planning Workflow Integration + +```mermaid +graph TD + Plan[Create Task Plan] --> TaskTracker[TaskTrackerTool] + TaskTracker --> Edit[Edit Files] + Edit --> PlanningEditor[PlanningFileEditorTool] + PlanningEditor --> UpdateTasks[Update Task Status] + UpdateTasks --> TaskTracker + + style Plan fill:#fff3e0 + style Edit fill:#e1f5fe + style UpdateTasks fill:#e8f5e8 +``` + +## Usage in Planning Workflows + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): + +```python +from openhands.tools.preset import get_planning_tools + +# Get planning toolset (includes PlanningFileEditorTool) +tools = get_planning_tools() + +agent = Agent(llm=llm, tools=tools) +``` + +## Multi-File Workflow Example + +```python +# 1. Plan tasks +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Create config file", status="todo"), + Task(title="Create main script", status="todo"), + Task(title="Create tests", status="todo") + ] +) + +# 2. Create files +PlanningFileEditAction( + command="create", + path="config.yaml", + file_text="settings:\n debug: true\n" +) + +PlanningFileEditAction( + command="create", + path="main.py", + file_text="import yaml\n\nif __name__ == '__main__':\n pass\n" +) + +# 3. Update task status +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Create config file", status="done"), + Task(title="Create main script", status="done"), + Task(title="Create tests", status="in_progress") + ] +) +``` + +## Best Practices + +1. **Use with TaskTrackerTool**: Coordinate file edits with task status +2. **Plan Before Editing**: Create task plan first +3. **Update Progress**: Mark tasks complete after edits +4. **Follow Workflow**: Plan → Edit → Update → Repeat +5. **Use Planning Preset**: Get all planning tools together + +## When to Use + +Use PlanningFileEditorTool when: +- Building complex multi-file projects +- Following structured planning workflows +- Coordinating with task tracking +- Need agent to manage implementation phases + +Use regular FileEditorTool for: +- Simple file editing tasks +- Single-file modifications +- Ad-hoc editing without planning + +## See Also + +- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Base file editing capabilities +- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Task management +- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Complete planning toolset +- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Full workflow example diff --git a/sdk/arch/tools/task_tracker.mdx b/sdk/arch/tools/task_tracker.mdx new file mode 100644 index 00000000..73966ef4 --- /dev/null +++ b/sdk/arch/tools/task_tracker.mdx @@ -0,0 +1,146 @@ +--- +title: TaskTrackerTool +description: Track and manage agent tasks with status updates and structured task lists. +--- + +TaskTrackerTool enables agents to create, update, and manage task lists for complex multi-step workflows. + +**Source**: [`openhands/tools/task_tracker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/task_tracker) + +## Usage + +```python +from openhands.tools import TaskTrackerTool + +agent = Agent(llm=llm, tools=[TaskTrackerTool.create()]) +``` + +## Action Model + +**Source**: [`openhands/tools/task_tracker/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/task_tracker/definition.py) + +```python +class TaskTrackerAction(Action): + command: Literal["view", "plan"] + task_list: list[Task] | None = None # For plan command +``` + +### Task Model + +```python +class Task: + title: str # Task title + status: Literal["todo", "in_progress", "done"] # Task status + notes: str | None = None # Optional notes +``` + +## Observation Model + +```python +class TaskTrackerObservation(Observation): + task_list: list[Task] # Current task list + message: str # Status message +``` + +## Commands + +### view +View current task list. + +```python +TaskTrackerAction(command="view") +``` + +### plan +Create or update task list. + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Setup environment", status="done"), + Task(title="Write code", status="in_progress"), + Task(title="Run tests", status="todo") + ] +) +``` + +## Usage Patterns + +### Initialize Task List + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Analyze requirements", status="todo"), + Task(title="Design solution", status="todo"), + Task(title="Implement features", status="todo"), + Task(title="Write tests", status="todo"), + Task(title="Deploy", status="todo") + ] +) +``` + +### Update Progress + +```python +TaskTrackerAction( + command="plan", + task_list=[ + Task(title="Analyze requirements", status="done"), + Task(title="Design solution", status="in_progress"), + Task(title="Implement features", status="todo"), + Task(title="Write tests", status="todo"), + Task(title="Deploy", status="todo") + ] +) +``` + +### Check Current Status + +```python +TaskTrackerAction(command="view") +# Returns current task list with status +``` + +## Best Practices + +1. **Plan Early**: Create task list at workflow start +2. **Update Regularly**: Mark tasks as progress happens +3. **Use Notes**: Add details for complex tasks +4. **One Task Active**: Focus on one "in_progress" task +5. **Mark Complete**: Set "done" when finished + +## Task Status Workflow + +```mermaid +graph LR + TODO[todo] -->|Start work| PROGRESS[in_progress] + PROGRESS -->|Complete| DONE[done] + DONE -->|Reopen if needed| TODO + + style TODO fill:#fff3e0 + style PROGRESS fill:#e1f5fe + style DONE fill:#c8e6c9 +``` + +## Example: Planning Agent + +See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): + +```python +# Planning agent uses TaskTrackerTool for workflow management +from openhands.tools.preset import get_planning_tools + +agent = Agent( + llm=llm, + tools=get_planning_tools() # Includes TaskTrackerTool +) +``` + +## See Also + +- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning +- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Planning toolset +- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Complete workflow diff --git a/sdk/arch/workspace/docker.mdx b/sdk/arch/workspace/docker.mdx new file mode 100644 index 00000000..4c26fd52 --- /dev/null +++ b/sdk/arch/workspace/docker.mdx @@ -0,0 +1,330 @@ +--- +title: DockerWorkspace +description: Execute agent operations in isolated Docker containers with automatic container lifecycle management. +--- + +DockerWorkspace provides isolated execution environments using Docker containers. It automatically manages container lifecycle, networking, and resource allocation. + +**Source**: [`openhands/workspace/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/docker) + +## Overview + +DockerWorkspace provides: +- Automatic container creation and cleanup +- Network isolation and port management +- Custom or pre-built Docker images +- Environment variable forwarding +- File system mounting +- Resource limits and controls + +## Usage + +### Basic Usage + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + base_image="python:3.12" +) + +with workspace: + result = workspace.execute_command("python --version") + print(result.stdout) # Python 3.12.x +``` + +### With Pre-built Image + +```python +workspace = DockerWorkspace( + working_dir="/workspace", + server_image="ghcr.io/all-hands-ai/agent-server:latest" +) +``` + +## Configuration + +**Source**: [`openhands/workspace/docker/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/docker/workspace.py) + +### Core Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `working_dir` | `str` | `"/workspace"` | Working directory in container | +| `base_image` | `str \| None` | `None` | Base image to build agent server from | +| `server_image` | `str \| None` | `None` | Pre-built agent server image | +| `host_port` | `int \| None` | `None` | Host port to bind (auto-assigned if None) | +| `forward_env` | `list[str]` | `["DEBUG"]` | Environment variables to forward | +| `container_name` | `str \| None` | `None` | Container name (auto-generated if None) | +| `platform` | `str \| None` | `None` | Target platform (e.g., "linux/amd64") | + +### Using Base Image + +Build agent server on top of custom base image: + +```python +workspace = DockerWorkspace( + base_image="ubuntu:22.04", + working_dir="/workspace" +) +``` + +Agent server components are installed on top of the base image. + +### Using Pre-built Server Image + +Use pre-built agent server image: + +```python +workspace = DockerWorkspace( + server_image="ghcr.io/all-hands-ai/agent-server:latest", + working_dir="/workspace" +) +``` + +Faster startup, no build time required. + +## Lifecycle Management + +### Automatic Cleanup + +```python +with DockerWorkspace(base_image="python:3.12") as workspace: + # Container created + workspace.execute_command("pip install requests") + # Commands execute in container +# Container automatically stopped and removed +``` + +### Manual Management + +```python +workspace = DockerWorkspace(base_image="python:3.12") + +# Manually start (happens automatically in context manager) +# Use workspace +result = workspace.execute_command("ls") + +# Manually cleanup +workspace.__exit__(None, None, None) +``` + +## Environment Configuration + +### Forward Environment Variables + +```python +import os + +os.environ["DATABASE_URL"] = "postgres://..." +os.environ["API_KEY"] = "secret" + +workspace = DockerWorkspace( + base_image="python:3.12", + forward_env=["DATABASE_URL", "API_KEY", "DEBUG"] +) + +with workspace: + result = workspace.execute_command("echo $DATABASE_URL") + # Outputs: postgres://... +``` + +### Custom Container Name + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + container_name="my-agent-container" +) +``` + +Useful for debugging and monitoring. + +### Platform Specification + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + platform="linux/amd64" # Force specific platform +) +``` + +Useful for Apple Silicon Macs running amd64 images. + +## Port Management + +DockerWorkspace automatically finds available ports for container communication: + +```python +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=None # Auto-assign (default) +) + +# Or specify explicit port +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=8000 # Use specific port +) +``` + +## File Operations + +### File Upload + +```python +workspace.file_upload( + source_path="local_file.txt", + destination_path="/workspace/file.txt" +) +``` + +### File Download + +```python +workspace.file_download( + source_path="/workspace/output.txt", + destination_path="local_output.txt" +) +``` + +## Building Docker Images + +DockerWorkspace can build custom agent server images: + +**Source**: [`openhands/agent_server/docker/build.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/agent_server/docker/build.py) + +```python +from openhands.agent_server.docker.build import ( + BuildOptions, + build +) + +# Build custom image +image_name = build( + BuildOptions( + base_image="ubuntu:22.04", + target="runtime", # or "dev" + platform="linux/amd64", + context_dir="." + ) +) + +# Use built image +workspace = DockerWorkspace(server_image=image_name) +``` + +## Use with Conversation + +```python +from openhands.sdk import Agent, Conversation +from openhands.tools import BashTool, FileEditorTool +from openhands.workspace import DockerWorkspace + +# Create workspace +workspace = DockerWorkspace( + base_image="python:3.12", + working_dir="/workspace" +) + +# Create agent +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Use in conversation +with workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Create a Python web scraper") + conversation.run() +``` + +See [`examples/02_remote_agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server) for complete examples. + +## Security Benefits + +### Isolation + +- **Process Isolation**: Container runs separately from host +- **File System Isolation**: Limited access to host file system +- **Network Isolation**: Separate network namespace + +### Resource Limits + +```python +# Resource limits are configurable via Docker +# Set through Docker API or Dockerfile +``` + +### Sandboxing + +DockerWorkspace provides strong sandboxing: +- Agent cannot access host file system +- Agent cannot interfere with host processes +- Agent operates in controlled environment + +## Performance Considerations + +### Container Startup Time + +- **Base Image Build**: 30-60 seconds (first time) +- **Pre-built Image**: 5-10 seconds +- **Subsequent Runs**: Uses cached images + +### Optimization Tips + +1. **Use Pre-built Images**: Faster than building from base image +2. **Cache Base Images**: Docker caches layers +3. **Minimize Image Size**: Smaller images start faster +4. **Reuse Containers**: For multiple operations (advanced) + +## Troubleshooting + +### Container Fails to Start + +```python +# Check Docker is running +docker ps + +# Check logs +docker logs + +# Verify image exists +docker images +``` + +### Port Already in Use + +```python +# Specify different port +workspace = DockerWorkspace( + base_image="python:3.12", + host_port=8001 # Use alternative port +) +``` + +### Permission Issues + +```python +# Ensure Docker has necessary permissions +# On Linux, add user to docker group: +# sudo usermod -aG docker $USER +``` + +## Best Practices + +1. **Use Context Managers**: Always use `with` statement +2. **Pre-build Images**: Build agent server images ahead of time +3. **Set Resource Limits**: Configure appropriate limits +4. **Monitor Containers**: Track resource usage +5. **Clean Up**: Ensure containers are removed after use +6. **Use Specific Tags**: Pin image versions for reproducibility + +## See Also + +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server running in container +- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Docker workspace examples diff --git a/sdk/arch/workspace/overview.mdx b/sdk/arch/workspace/overview.mdx new file mode 100644 index 00000000..6a539776 --- /dev/null +++ b/sdk/arch/workspace/overview.mdx @@ -0,0 +1,99 @@ +--- +title: Workspace Package Overview +description: Advanced workspace implementations providing sandboxed and remote execution environments. +--- + +The `openhands.workspace` package provides advanced workspace implementations for production deployments, including Docker-based sandboxing and remote API execution. + +**Source**: [`openhands/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace) + +## Available Workspaces + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker container isolation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote server execution + +## Workspace Hierarchy + +```mermaid +graph TD + Base[BaseWorkspace] --> Local[LocalWorkspace] + Base --> Remote[RemoteWorkspace] + Remote --> Docker[DockerWorkspace] + Remote --> API[RemoteAPIWorkspace] + + style Base fill:#e1f5fe + style Local fill:#e8f5e8 + style Remote fill:#fff3e0 + style Docker fill:#f3e5f5 + style API fill:#f3e5f5 +``` + +- **BaseWorkspace**: Core interface (in SDK) +- **LocalWorkspace**: Direct local execution (in SDK) +- **RemoteWorkspace**: Base for remote implementations +- **DockerWorkspace**: Docker container execution +- **RemoteAPIWorkspace**: API-based remote execution + +## Comparison + +| Feature | LocalWorkspace | DockerWorkspace | RemoteAPIWorkspace | +|---------|---------------|-----------------|-------------------| +| **Isolation** | None | Strong | Strong | +| **Performance** | Fast | Good | Network latency | +| **Setup** | None | Docker required | Server required | +| **Security** | Host system | Sandboxed | Sandboxed | +| **Use Case** | Development | Production/Testing | Distributed systems | + +## Quick Start + +### Docker Workspace + +```python +from openhands.workspace import DockerWorkspace + +workspace = DockerWorkspace( + working_dir="/workspace", + image="ubuntu:22.04" +) + +with workspace: + result = workspace.execute_command("echo 'Hello from Docker'") + print(result.stdout) +``` + +### Remote API Workspace + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("python script.py") + print(result.stdout) +``` + +## Use Cases + +### Development +Use `LocalWorkspace` for local development and testing. + +### Testing +Use `DockerWorkspace` for isolated test environments. + +### Production +Use `DockerWorkspace` or `RemoteAPIWorkspace` for production deployments. + +### Multi-User Systems +Use `RemoteAPIWorkspace` with centralized agent server. + +## See Also + +- **[SDK Workspace Interface](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker implementation +- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote API implementation +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server for remote workspaces diff --git a/sdk/arch/workspace/remote_api.mdx b/sdk/arch/workspace/remote_api.mdx new file mode 100644 index 00000000..cb8ca8a4 --- /dev/null +++ b/sdk/arch/workspace/remote_api.mdx @@ -0,0 +1,325 @@ +--- +title: RemoteAPIWorkspace +description: Connect to centralized agent servers via HTTP API for scalable distributed agent execution. +--- + +RemoteAPIWorkspace enables agent execution on remote servers through HTTP APIs. It's designed for production deployments requiring centralized agent management and multi-user support. + +**Source**: [`openhands/workspace/remote_api/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/remote_api) + +## Overview + +RemoteAPIWorkspace provides: +- HTTP API communication with agent server +- Authentication and authorization +- Centralized resource management +- Multi-user agent execution +- Monitoring and logging + +## Usage + +### Basic Usage + +```python +from openhands.workspace import RemoteAPIWorkspace + +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +with workspace: + result = workspace.execute_command("python script.py") + print(result.stdout) +``` + +### With Agent + +```python +from openhands.sdk import Agent, Conversation +from openhands.tools import BashTool, FileEditorTool + +# Create workspace +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://agent-server.example.com", + api_key="your-api-key" +) + +# Create agent +agent = Agent( + llm=llm, + tools=[BashTool.create(), FileEditorTool.create()] +) + +# Use in conversation +conversation = Conversation(agent=agent, workspace=workspace) +conversation.send_message("Your task") +conversation.run() +``` + +## Configuration + +**Source**: [`openhands/workspace/remote_api/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/remote_api/workspace.py) + +### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `working_dir` | `str` | Yes | Working directory on server | +| `api_url` | `str` | Yes | Agent server API URL | +| `api_key` | `str` | Yes | Authentication API key | +| `timeout` | `float` | No | Request timeout (default: 30) | + +### Example Configuration + +```python +workspace = RemoteAPIWorkspace( + working_dir="/workspace/user123", + api_url="https://agents.company.com", + api_key="sk-abc123...", + timeout=60.0 # 60 second timeout +) +``` + +## API Communication + +### HTTP Endpoints + +RemoteAPIWorkspace communicates with agent server endpoints: + +- `POST /api/workspace/command` - Execute commands +- `POST /api/workspace/upload` - Upload files +- `GET /api/workspace/download` - Download files +- `GET /api/health` - Health check + +### Authentication + +```python +# API key passed in Authorization header +headers = { + "Authorization": f"Bearer {api_key}" +} +``` + +### Error Handling + +```python +try: + result = workspace.execute_command("command") +except ConnectionError: + print("Failed to connect to agent server") +except TimeoutError: + print("Request timed out") +except Exception as e: + print(f"Execution error: {e}") +``` + +## File Operations + +### Upload Files + +```python +workspace.file_upload( + source_path="local_data.csv", + destination_path="/workspace/data.csv" +) +``` + +### Download Files + +```python +workspace.file_download( + source_path="/workspace/results.json", + destination_path="local_results.json" +) +``` + +### Large File Transfer + +```python +# Chunked upload for large files +workspace.file_upload( + source_path="large_dataset.zip", + destination_path="/workspace/dataset.zip" +) +``` + +## Architecture + +```mermaid +graph LR + Client[Client SDK] -->|HTTPS| API[Agent Server API] + API --> Container1[Container 1] + API --> Container2[Container 2] + API --> Container3[Container 3] + + Container1 --> Agent1[Agent] + Container2 --> Agent2[Agent] + Container3 --> Agent3[Agent] + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style Container1 fill:#e8f5e8 + style Container2 fill:#e8f5e8 + style Container3 fill:#e8f5e8 +``` + +## Use Cases + +### Multi-User Platform + +```python +# Each user gets isolated workspace +user_workspace = RemoteAPIWorkspace( + working_dir=f"/workspace/{user_id}", + api_url="https://agents.platform.com", + api_key=user_api_key +) +``` + +### Scalable Agent Execution + +```python +# Server manages resource allocation +# Multiple agents run concurrently +# Automatic load balancing +``` + +### Centralized Monitoring + +```python +# Server tracks: +# - Resource usage per user +# - Agent execution logs +# - API usage metrics +# - Error rates and debugging info +``` + +## Security + +### Authentication + +- API key-based authentication +- Per-user access control +- Token expiration and rotation + +### Isolation + +- Separate workspaces per user +- Container-based sandboxing +- Network isolation + +### Data Protection + +- HTTPS communication +- Encrypted data transfer +- Secure file storage + +## Performance Considerations + +### Network Latency + +```python +# Latency depends on: +# - Network connection +# - Geographic distance +# - Server load + +# Optimization: +# - Use regional servers +# - Batch operations +# - Cache frequently accessed data +``` + +### Concurrent Execution + +```python +# Server handles concurrent requests +# Multiple users can run agents simultaneously +# Automatic resource management +``` + +## Deployment + +### Running Agent Server + +See [Agent Server Documentation](/sdk/architecture/agent_server/overview.mdx) for server setup: + +```bash +# Start agent server +docker run -d \ + -p 8000:8000 \ + -e API_KEY=your-secret-key \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +### Using Deployed Server + +```python +# Client connects to deployed server +workspace = RemoteAPIWorkspace( + working_dir="/workspace", + api_url="https://your-server.com", + api_key="your-secret-key" +) +``` + +## Comparison with DockerWorkspace + +| Feature | DockerWorkspace | RemoteAPIWorkspace | +|---------|-----------------|-------------------| +| **Setup** | Local Docker | Remote server | +| **Network** | Local | Internet required | +| **Scaling** | Single machine | Multiple users | +| **Management** | Client-side | Server-side | +| **Latency** | Low | Network dependent | +| **Use Case** | Local dev/test | Production | + +## Best Practices + +1. **Use HTTPS**: Always use secure connections +2. **Rotate API Keys**: Regularly update authentication +3. **Set Timeouts**: Configure appropriate timeouts +4. **Handle Network Errors**: Implement retry logic +5. **Monitor Usage**: Track API calls and resource usage +6. **Regional Deployment**: Use nearby servers for lower latency +7. **Batch Operations**: Combine multiple operations when possible + +## Troubleshooting + +### Connection Failures + +```python +# Verify server is reachable +import requests +response = requests.get(f"{api_url}/api/health") +print(response.status_code) # Should be 200 +``` + +### Authentication Errors + +```python +# Verify API key is correct +# Check key has not expired +# Ensure proper authorization headers +``` + +### Timeout Issues + +```python +# Increase timeout for long operations +workspace = RemoteAPIWorkspace( + api_url=api_url, + api_key=api_key, + timeout=120.0 # 2 minutes +) +``` + +## See Also + +- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Local Docker execution +- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server implementation +- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface +- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace examples diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx new file mode 100644 index 00000000..41977f29 --- /dev/null +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -0,0 +1,65 @@ +--- +title: PR Review Workflow +description: Automate pull request reviews with AI-powered code analysis using GitHub Actions. +--- + + +This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) + + +Automatically review pull requests when labeled, providing comprehensive feedback on code quality, security, and best practices. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Create a "review-this" label in your repository +# Go to Issues → Labels → New label +``` + +## Features + +- **Automatic Trigger** - Reviews start when `review-this` label is added +- **Comprehensive Analysis** - Analyzes changes in full repository context +- **Detailed Feedback** - Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR + +## Usage + +### Trigger a Review + +1. Open a pull request +2. Add the `review-this` label +3. Wait for the workflow to complete +4. Review feedback posted as PR comments + +## Configuration + +Edit `.github/workflows/pr-review.yml` to customize: + +```yaml +env: + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 + # LLM_BASE_URL: 'https://custom-api.example.com' # Optional +``` + +## Review Coverage + +The agent analyzes: + +- **Code Quality** - Readability, maintainability, patterns +- **Security** - Potential vulnerabilities and risks +- **Best Practices** - Language and framework conventions +- **Improvements** - Specific actionable suggestions +- **Positive Feedback** - Recognition of good practices + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/prompt.py) diff --git a/sdk/guides/github-workflows/routine-maintenance.mdx b/sdk/guides/github-workflows/routine-maintenance.mdx new file mode 100644 index 00000000..86b42168 --- /dev/null +++ b/sdk/guides/github-workflows/routine-maintenance.mdx @@ -0,0 +1,74 @@ +--- +title: Routine Maintenance Workflow +description: Automate routine maintenance tasks with GitHub Actions and OpenHands agents. +--- + + +This example is available on GitHub: [examples/github_workflows/01_basic_action/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/01_basic_action) + + +Set up automated or scheduled GitHub Actions workflows to handle routine maintenance tasks like dependency updates, documentation improvements, and code cleanup. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/github_workflows/01_basic_action/workflow.yml .github/workflows/maintenance.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Configure the prompt in workflow.yml +# See below for options +``` + +## Configuration + +### Option A: Direct Prompt + +```yaml +env: + PROMPT_STRING: 'Check for outdated dependencies and create a PR to update them' + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 +``` + +### Option B: Remote Prompt + +```yaml +env: + PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 +``` + +## Usage + +### Manual Trigger + +1. Go to **Actions** → "Maintenance Task" +2. Click **Run workflow** +3. Optionally override prompt settings +4. Click **Run workflow** + +### Scheduled Runs + +Uncomment the schedule section in `workflow.yml`: + +```yaml +on: + schedule: + - cron: "0 2 * * *" # Run at 2 AM UTC daily +``` + +## Example Use Cases + +- **Dependency Updates** - Check and update outdated packages +- **Documentation** - Update docs to reflect code changes +- **Test Coverage** - Identify and improve under-tested code +- **Linting** - Apply formatting and linting fixes +- **Link Validation** - Find and report broken links + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/workflow.yml) +- [GitHub Actions Cron Syntax](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule) diff --git a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx new file mode 100644 index 00000000..9f8bef79 --- /dev/null +++ b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx @@ -0,0 +1,42 @@ +--- +title: API Sandboxed Server +description: Connect to hosted API-based agent server for fully managed infrastructure. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +Connect to a hosted API-based agent server for fully managed infrastructure without running your own server. + +## How to Run + +```bash +export LLM_API_KEY="your-api-key" +export AGENT_SERVER_URL="https://api.openhands.ai" +export AGENT_SERVER_API_KEY="your-server-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation( + agent_server_url="https://api.openhands.ai", + api_key=server_api_key +) +``` + +No server management required - connect to hosted API. + +## Benefits + +- **Zero Ops** - No server management +- **Scalability** - Auto-scaling infrastructure +- **Reliability** - Managed uptime and monitoring + +## Related Documentation + +- [Agent Server Architecture](/sdk/arch/agent_server/overview) +- [Remote Workspace](/sdk/arch/workspace/remote_api) diff --git a/sdk/guides/remote-agent-server/browser-with-docker.mdx b/sdk/guides/remote-agent-server/browser-with-docker.mdx new file mode 100644 index 00000000..a3230976 --- /dev/null +++ b/sdk/guides/remote-agent-server/browser-with-docker.mdx @@ -0,0 +1,44 @@ +--- +title: Browser with Docker Sandboxed Server +description: Use browser tools with Docker-sandboxed agent server for web automation. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +Combine browser automation capabilities with Docker isolation for secure web interaction. + +## How to Run + +```bash +# Start server with browser support +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest-browser + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation(agent_server_url="http://localhost:8000") +conversation.send_message("Navigate to GitHub and search for OpenHands") +``` + +Browser tools run in isolated Docker container with the agent. + +## Benefits + +- **Secure Browsing** - Isolate web interactions +- **Clean Environment** - Fresh browser state for each session +- **Resource Control** - Limit browser resource usage + +## Related Documentation + +- [Browser Tool](/sdk/arch/tools/browser_use) +- [Docker Workspace](/sdk/arch/workspace/docker) diff --git a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx new file mode 100644 index 00000000..8c7967e3 --- /dev/null +++ b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx @@ -0,0 +1,184 @@ +--- +title: Docker Workspace & Sandboxed Server +description: Run agents in isolated Docker containers for security and reproducibility. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +Docker workspaces provide complete isolation by running agents in containers. Use for production deployments, testing, and untrusted code execution. + +## DockerWorkspace + +Execute in isolated Docker containers with security boundaries. + +### Direct Usage + +```python +from openhands.workspace import DockerWorkspace +from openhands.sdk import Conversation + +workspace = DockerWorkspace( + working_dir="/workspace", + base_image="python:3.12" +) + +with workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Build a web server") + conversation.run() +# Container automatically cleaned up +``` + +See [`01_docker_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_docker_workspace.py) + +### When to Use + +- **Production** - Isolated execution environment +- **Testing** - Clean, reproducible environments +- **Untrusted code** - Run agent in sandbox +- **Multi-user** - Each user gets isolated container + +### Configuration Options + +```python +DockerWorkspace( + working_dir="/workspace", + base_image="ubuntu:22.04", # Build from base image + # OR + server_image="ghcr.io/all-hands-ai/agent-server:latest", # Pre-built image + host_port=None, # Auto-assign port + platform="linux/amd64" # Platform override +) +``` + +### Pre-built Images + +Use pre-built images for faster startup: + +```python +workspace = DockerWorkspace( + working_dir="/workspace", + server_image="ghcr.io/all-hands-ai/agent-server:latest" +) +``` + +No build time - container starts immediately. + +### File Transfer + +Copy files to/from container: + +```python +# Upload file +workspace.upload_file("/local/path/file.txt", "/workspace/file.txt") + +# Download file +workspace.download_file("/workspace/output.txt", "/local/path/output.txt") +``` + +## Docker Sandboxed Server + +Run agent server in Docker and connect remotely. + +### How to Run + +```bash +# Start server in Docker +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +``` + +### Client Connection + +```python +from openhands.sdk import RemoteConversation + +conversation = RemoteConversation( + agent_server_url="http://localhost:8000", + api_key=api_key +) +conversation.send_message("Your task") +conversation.run() +``` + +## Benefits + +**Security:** +- Complete isolation from host system +- Agent cannot access host files +- Agent cannot affect host processes + +**Resources:** +- Control CPU/memory limits +- Monitor container resource usage +- Kill containers if needed + +**Reproducibility:** +- Consistent environment across deployments +- Version-controlled container images +- Easy rollback to previous versions + +## Docker vs Local Workspace + +| Feature | LocalWorkspace | DockerWorkspace | +|---------|----------------|-----------------| +| **Security** | Low (host access) | High (isolated) | +| **Setup** | None | Docker required | +| **Performance** | Fast | Slight overhead | +| **Cleanup** | Manual | Automatic | +| **Best for** | Development | Production | + +## Best Practices + +### 1. Use Pre-built Images + +```python +# āœ… Good: Fast startup +server_image="ghcr.io/all-hands-ai/agent-server:latest" + +# āŒ Slow: Builds on every run +base_image="python:3.12" +``` + +### 2. Clean Up Containers + +Use context manager for automatic cleanup: + +```python +with workspace: + # Work with workspace + pass +# Container automatically removed +``` + +### 3. Resource Limits + +Set Docker resource limits: + +```bash +docker run --memory="2g" --cpus="1.5" \ + ghcr.io/all-hands-ai/runtime:latest +``` + +### 4. Volume Mounts + +Mount local directories for persistent data: + +```bash +docker run -v /local/data:/workspace/data \ + ghcr.io/all-hands-ai/runtime:latest +``` + +## Related Documentation + +- **[Browser with Docker](/sdk/guides/remote-agent-server/browser-with-docker)** - Browser in container +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details diff --git a/sdk/guides/remote-agent-server/local-agent-server.mdx b/sdk/guides/remote-agent-server/local-agent-server.mdx new file mode 100644 index 00000000..c08c9c8d --- /dev/null +++ b/sdk/guides/remote-agent-server/local-agent-server.mdx @@ -0,0 +1,91 @@ +--- +title: Local Agent Server & Workspaces +description: Understand workspaces and run agent server locally for client-server architecture. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +Workspaces define where agents execute commands and access files. This guide introduces workspace concepts and demonstrates the local agent server setup. + +## Workspace Types + +| Type | Security | Setup | Use Case | +|------|----------|-------|----------| +| **LocalWorkspace** | Low (host access) | None | Development | +| **DockerWorkspace** | High (isolated) | Docker | Testing, Production | +| **RemoteAPIWorkspace** | High (isolated) | Server | Multi-user, Cloud | + +## LocalWorkspace + +Execute directly on your machine - default for standalone SDK. + +### Usage + +```python +from openhands.sdk import Conversation + +# LocalWorkspace is implicit (no workspace parameter needed) +conversation = Conversation(agent=agent) +conversation.send_message("Create a Python script") +conversation.run() +``` + +Operations run in current working directory with direct host access. + +### When to Use + +- **Development** - Quick iteration and testing +- **Local files** - Direct access to local filesystem +- **Simple tasks** - No isolation needed + +### Security Considerations + +āš ļø **Warning**: Agent has full host access: +- Can modify any accessible files +- Can execute any commands +- **Not recommended for production or untrusted code** + +## Remote Agent Server + +Run agent server and connect remotely for resource isolation and scalability. + +### How to Run + +```bash +# Terminal 1: Start server +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python -m openhands.agent_server + +# Terminal 2: Run client +export LLM_API_KEY="your-api-key" +uv run python examples/02_remote_agent_server/01_convo_with_local_agent_server.py +``` + +### Client Connection + +```python +from openhands.sdk import RemoteConversation + +conversation = RemoteConversation( + agent_server_url="http://localhost:8000", + api_key=api_key +) +conversation.send_message("Your task") +conversation.run() +``` + +### Benefits + +- **Resource Isolation** - Server handles compute-intensive tasks +- **Scalability** - Multiple clients connect to same server +- **Deployment** - Separate client and execution environments +- **Security** - Isolate agent execution from client + +## Related Documentation + +- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/docker-sandboxed-server)** - Isolated execution +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design diff --git a/sdk/guides/remote-agent-server/vscode-with-docker.mdx b/sdk/guides/remote-agent-server/vscode-with-docker.mdx new file mode 100644 index 00000000..78aa7598 --- /dev/null +++ b/sdk/guides/remote-agent-server/vscode-with-docker.mdx @@ -0,0 +1,43 @@ +--- +title: VS Code with Docker Sandboxed Server +description: Enable VS Code integration for code editing with Docker-sandboxed agent. +--- + + +This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) + + +Use VS Code tools with Docker-sandboxed agent server for code editing and development workflows. + +## How to Run + +```bash +# Start server with VS Code support +docker run -p 8000:8000 \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/all-hands-ai/runtime:latest-vscode + +# Run client +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +``` + +## Key Concept + +```python +conversation = RemoteConversation(agent_server_url="http://localhost:8000") +conversation.send_message("Create a Python Flask app with routes") +``` + +Agent uses VS Code tools for editing, navigation, and refactoring in isolated environment. + +## Benefits + +- **Rich Code Editing** - VS Code features in agent workflows +- **Isolated Development** - Safe code changes in container +- **Full IDE Features** - Syntax highlighting, auto-complete, etc. + +## Related Documentation + +- [Docker Workspace](/sdk/arch/workspace/docker) From e4dd99f34fdea91970db4203bc5f3d5b78c22a8f Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 12:04:29 -0400 Subject: [PATCH 25/58] rename to agent-sdk --- sdk/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/index.mdx b/sdk/index.mdx index 3eeef3d0..98779169 100644 --- a/sdk/index.mdx +++ b/sdk/index.mdx @@ -1,5 +1,5 @@ --- -title: OpenHands SDK +title: Agent SDK description: Build AI agents that write software. A clean, modular SDK with production-ready tools. icon: code mode: wide From f66f00528ec2f2f487b2537e19d8c9c9ca7bea3e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 13:03:08 -0400 Subject: [PATCH 26/58] initial rewrite --- docs.json | 6 + .../agent-server/api-sandboxed-server.mdx | 222 +++++++++++ .../agent-server/browser-with-docker.mdx | 237 +++++++++++ .../agent-server/docker-sandboxed-server.mdx | 264 +++++++++++++ .../agent-server/local-agent-server.mdx | 367 ++++++++++++++++++ sdk/guides/agent-server/overview.mdx | 315 +++++++++++++++ .../agent-server/vscode-with-docker.mdx | 292 ++++++++++++++ .../api-sandboxed-server.mdx | 42 -- .../browser-with-docker.mdx | 44 --- .../docker-sandboxed-server.mdx | 184 --------- .../local-agent-server.mdx | 91 ----- .../vscode-with-docker.mdx | 43 -- 12 files changed, 1703 insertions(+), 404 deletions(-) create mode 100644 sdk/guides/agent-server/api-sandboxed-server.mdx create mode 100644 sdk/guides/agent-server/browser-with-docker.mdx create mode 100644 sdk/guides/agent-server/docker-sandboxed-server.mdx create mode 100644 sdk/guides/agent-server/local-agent-server.mdx create mode 100644 sdk/guides/agent-server/overview.mdx create mode 100644 sdk/guides/agent-server/vscode-with-docker.mdx delete mode 100644 sdk/guides/remote-agent-server/api-sandboxed-server.mdx delete mode 100644 sdk/guides/remote-agent-server/browser-with-docker.mdx delete mode 100644 sdk/guides/remote-agent-server/docker-sandboxed-server.mdx delete mode 100644 sdk/guides/remote-agent-server/local-agent-server.mdx delete mode 100644 sdk/guides/remote-agent-server/vscode-with-docker.mdx diff --git a/docs.json b/docs.json index 3c37c3de..9d330338 100644 --- a/docs.json +++ b/docs.json @@ -219,6 +219,12 @@ { "group": "Remote Agent Server", "pages": [ + "sdk/guides/agent-server/overview", + "sdk/guides/agent-server/local-agent-server", + "sdk/guides/agent-server/docker-sandboxed-server", + "sdk/guides/agent-server/api-sandboxed-server", + "sdk/guides/agent-server/browser-with-docker", + "sdk/guides/agent-server/vscode-with-docker", { "group": "API Reference", "openapi": { diff --git a/sdk/guides/agent-server/api-sandboxed-server.mdx b/sdk/guides/agent-server/api-sandboxed-server.mdx new file mode 100644 index 00000000..8dc6fb5d --- /dev/null +++ b/sdk/guides/agent-server/api-sandboxed-server.mdx @@ -0,0 +1,222 @@ +--- +title: API Sandboxed Server +description: Connect to hosted API-based agent server for fully managed infrastructure. +--- + +The API Sandboxed Server demonstrates how to use APIRemoteWorkspace to connect to a hosted runtime API service. This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. + +## Basic Example + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +This example shows how to connect to a hosted runtime API for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +"""Example: APIRemoteWorkspace with Dynamic Build. + +This example demonstrates building an agent-server image on-the-fly from the SDK +codebase and launching it in a remote sandboxed environment via Runtime API. + +Usage: + uv run examples/24_remote_convo_with_api_sandboxed_server.py + +Requirements: + - LITELLM_API_KEY: API key for LLM access + - RUNTIME_API_KEY: API key for runtime API access +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import APIRemoteWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LITELLM_API_KEY") +assert api_key, "LITELLM_API_KEY required" + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) + + +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:latest-python", +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback], visualize=True + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + finally: + conversation.close() +``` + +```bash Running the Example +export LITELLM_API_KEY="your-api-key" +export RUNTIME_API_KEY="your-runtime-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Key Concepts + +### APIRemoteWorkspace + +The `APIRemoteWorkspace` connects to a hosted runtime API service: + +```python highlight={48-52} +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:latest-python", +) as workspace: +``` + +This workspace type: +- Connects to a remote runtime API service +- Automatically provisions sandboxed environments +- Manages container lifecycle through the API +- Handles all infrastructure concerns + +### Runtime API Authentication + +The example requires a runtime API key for authentication: + +```python highlight={42-45} +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) +``` + +This key authenticates your requests to the hosted runtime service. + +### Pre-built Image Selection + +You can specify which pre-built agent server image to use: + +```python highlight={51} +APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:latest-python", +) +``` + +The runtime API will pull and run the specified image in a sandboxed environment. + +### Workspace Testing + +Just like with DockerWorkspace, you can test the workspace before running the agent: + +```python highlight={61-64} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the remote runtime and ensures the environment is ready. + +### Automatic RemoteConversation + +The conversation automatically uses WebSocket communication with the remote server: + +```python highlight={66-68} +conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback], visualize=True +) +assert isinstance(conversation, RemoteConversation) +``` + +All agent execution happens on the remote runtime infrastructure. + +## When to Use API Sandboxed Server + +Use a hosted runtime API when you want: + +- **Zero Infrastructure**: No servers to manage or maintain +- **Automatic Scaling**: Handle varying workloads without configuration +- **Managed Security**: Professionally maintained sandboxed environments +- **Quick Start**: Get running without Docker or server setup +- **Production Ready**: Enterprise-grade reliability and monitoring + +## Benefits + +### Operational Advantages + +- **No DevOps**: Infrastructure is managed for you +- **Always Updated**: Latest security patches and features +- **Cost Efficient**: Pay only for usage +- **Global Distribution**: Low latency from multiple regions + +### Security Advantages + +- **Isolation**: Each execution in a fresh sandboxed container +- **Monitoring**: Built-in threat detection and logging +- **Compliance**: Pre-configured security controls +- **Auditing**: Full audit trails for all operations + +## Comparison with Other Workspace Types + +| Feature | APIRemoteWorkspace | DockerWorkspace | Local Workspace | +|---------|-------------------|-----------------|-----------------| +| **Setup** | API key only | Docker required | None | +| **Infrastructure** | Fully managed | Self-managed | Local only | +| **Isolation** | High | High | None | +| **Scalability** | Automatic | Manual | N/A | +| **Best For** | Production | Control | Development | + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Self-hosted alternative +- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development setup +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Technical details diff --git a/sdk/guides/agent-server/browser-with-docker.mdx b/sdk/guides/agent-server/browser-with-docker.mdx new file mode 100644 index 00000000..fc231a58 --- /dev/null +++ b/sdk/guides/agent-server/browser-with-docker.mdx @@ -0,0 +1,237 @@ +--- +title: Browser with Docker +description: Enable browser automation with Docker-sandboxed agents for secure web interaction. +--- + +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. + +## Basic Example + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +This example shows how to configure DockerWorkspace with browser capabilities and VNC access: + +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# Create a Docker-based remote workspace with extra ports for browser access +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + # TODO: Change this to your platform if not linux/arm64 + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" + + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + logger.info("šŸ“ Sending first message...") + conversation.send_message( + "Could you go to https://all-hands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() + + # Wait for user confirm to exit + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +``` + +## Key Concepts + +### Browser-Enabled DockerWorkspace + +The workspace is configured with extra ports for browser access: + +```python highlight={36-43} +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` + +The `extra_ports=True` setting exposes additional ports for: +- **Port 8011**: VS Code Web interface +- **Port 8012**: VNC viewer for browser visualization + +### Enabling Browser Tools + +Browser tools are enabled by setting `cli_mode=False`: + +```python highlight={46-50} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` + +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. + +### Sending Browser Tasks + +The agent can perform web automation tasks: + +```python highlight={72-76} +logger.info("šŸ“ Sending first message...") +conversation.send_message( + "Could you go to https://all-hands.dev/ blog page and summarize main " + "points of the latest blog?" +) +``` + +The agent will use browser tools to navigate to the URL, read the content, and provide a summary. + +### Visual Browser Access + +With `extra_ports=True`, you can watch the browser in real-time via VNC: + +```python highlight={80-89} +y = None +while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +This allows you to: +- See exactly what the agent is doing in the browser +- Debug browser automation issues +- Understand agent behavior visually + +## When to Use Browser with Docker + +Use browser-enabled Docker workspaces when you need: + +- **Web Scraping**: Extract data from websites safely +- **Web Automation**: Automate web-based workflows +- **Testing**: Test web applications in isolated environments +- **Visual Monitoring**: Watch agent interactions in real-time +- **Security**: Isolate web browsing from your host system + +## Benefits + +### Isolation Benefits + +- **Secure**: Browser runs in container, not on your host +- **Clean State**: Fresh browser for each run +- **No Pollution**: No cookies, cache, or history on your machine + +### Development Benefits + +- **Visual Debugging**: Watch the browser via VNC +- **Reproducible**: Same environment every time +- **Easy Cleanup**: Container removal clears everything + +### Production Benefits + +- **Resource Control**: Limit CPU/memory for browser +- **Concurrent Sessions**: Run multiple isolated browsers +- **Monitoring**: Track browser resource usage + +## VNC Access + +The VNC interface provides real-time visual access to the browser: + +``` +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` + +**URL Parameters:** +- `autoconnect=1`: Automatically connect to VNC server +- `resize=remote`: Automatically adjust resolution + +This is particularly useful for: +- Debugging navigation issues +- Verifying visual elements +- Understanding agent decision-making +- Demonstrating agent capabilities + +## Next Steps + +- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Enable VS Code integration +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Base Docker setup +- **[Browser Tool Architecture](/sdk/arch/tools/browser_use)** - Technical details diff --git a/sdk/guides/agent-server/docker-sandboxed-server.mdx b/sdk/guides/agent-server/docker-sandboxed-server.mdx new file mode 100644 index 00000000..05870971 --- /dev/null +++ b/sdk/guides/agent-server/docker-sandboxed-server.mdx @@ -0,0 +1,264 @@ +--- +title: Docker Sandboxed Server +description: Run agents in isolated Docker containers for security and reproducibility. +--- + +The Docker Sandboxed Server demonstrates how to run agents in isolated Docker containers using DockerWorkspace. This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. + +## Basic Example + + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically +with DockerWorkspace( + # dynamically build agent-server image + # base_image="nikolaik/python-nodejs:python3.12-nodejs22", + # use pre-built image for faster startup + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + + logger.info("šŸ“ Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("šŸš€ Running conversation...") + conversation.run() + logger.info("āœ… First task completed!") + logger.info(f"Agent status: {conversation.state.agent_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("ā³ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("āœ… Events have stopped") + + logger.info("šŸš€ Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("āœ… Second task completed!") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +``` + +## Key Concepts + +### DockerWorkspace Context Manager + +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: + +```python highlight={42-50} +with DockerWorkspace( + # use pre-built image for faster startup + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` + +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done + +### Platform Detection + +The example includes platform detection to ensure the correct Docker image is used: + +```python highlight={32-37} +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` + +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). + +### Environment Forwarding + +You can forward environment variables from your host to the container: + +```python highlight={49} +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + forward_env=["LLM_API_KEY"], # Forward API key to container +) +``` + +This allows the agent running inside the container to access necessary credentials. + +### Testing the Workspace + +Before creating a conversation, the example tests the workspace connection: + +```python highlight={68-74} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` + +This verifies the workspace is properly initialized and can execute commands. + +### Automatic RemoteConversation + +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: + +```python highlight={75-81} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. + +## When to Use Docker Sandboxed Server + +Use Docker containers when you need: + +- **Security**: Complete isolation from host system +- **Production**: Deploy agents in controlled environments +- **Testing**: Clean, reproducible test environments +- **Multi-tenant**: Isolate different users or workloads +- **Resource Control**: Set CPU/memory limits per container + +## Configuration Options + +### Pre-built vs Base Images + +```python +# āœ… Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) + +# ā±ļø Slower: Build from base image (more control) +DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) +``` + +Pre-built images start immediately, while base images need to build the agent server first. + +### Resource Limits + +When running Docker containers, you can set resource limits: + +```bash +docker run --memory="2g" --cpus="1.5" \ + -e LLM_API_KEY="your-api-key" \ + ghcr.io/openhands/agent-server:latest-python +``` + +## Next Steps + +- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Add browser capabilities +- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Enable VS Code tools +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Use managed hosting diff --git a/sdk/guides/agent-server/local-agent-server.mdx b/sdk/guides/agent-server/local-agent-server.mdx new file mode 100644 index 00000000..36420dc6 --- /dev/null +++ b/sdk/guides/agent-server/local-agent-server.mdx @@ -0,0 +1,367 @@ +--- +title: Local Agent Server +description: Run agents through a local HTTP server with RemoteConversation for client-server architecture. +--- + +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using RemoteConversation. This enables separation between client code and agent execution, making it possible to run agents on dedicated servers or different machines. + +## Basic Example + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +This example shows how to programmatically start a local agent server and interact with it through a RemoteConversation: + +```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py +import os +import subprocess +import sys +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger +from openhands.sdk.event import ConversationStateUpdateEvent +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def _stream_output(stream, prefix, target_stream): + """Stream output from subprocess to target stream with prefix.""" + try: + for line in iter(stream.readline, ""): + if line: + target_stream.write(f"[{prefix}] {line}") + target_stream.flush() + except Exception as e: + print(f"Error streaming {prefix}: {e}", file=sys.stderr) + finally: + stream.close() + + +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __init__(self, port: int = 8000, host: str = "127.0.0.1"): + self.port: int = port + self.host: str = host + self.process: subprocess.Popen[bytes] | None = None + self.base_url: str = f"http://{host}:{port}" + self.stdout_thread: threading.Thread | None = None + self.stderr_thread: threading.Thread | None = None + + def __enter__(self): + """Start the API server subprocess.""" + print(f"Starting OpenHands API server on {self.base_url}...") + + # Start the server process + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) + + # Start threads to stream stdout and stderr + self.stdout_thread = threading.Thread( + target=_stream_output, + args=(self.process.stdout, "SERVER", sys.stdout), + daemon=True, + ) + self.stderr_thread = threading.Thread( + target=_stream_output, + args=(self.process.stderr, "SERVER", sys.stderr), + daemon=True, + ) + + self.stdout_thread.start() + self.stderr_thread.start() + + # Wait for server to be ready + max_retries = 30 + for i in range(max_retries): + try: + import httpx + + response = httpx.get(f"{self.base_url}/health", timeout=1.0) + if response.status_code == 200: + print(f"API server is ready at {self.base_url}") + return self + except Exception: + pass + + if self.process.poll() is not None: + # Process has terminated + raise RuntimeError( + "Server process terminated unexpectedly. " + "Check the server logs above for details." + ) + + time.sleep(1) + + raise RuntimeError(f"Server failed to start after {max_retries} seconds") + + def __exit__(self, exc_type, exc_val, exc_tb): + """Stop the API server subprocess.""" + if self.process: + print("Stopping API server...") + self.process.terminate() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + print("Force killing API server...") + self.process.kill() + self.process.wait() + + # Wait for streaming threads to finish (they're daemon threads, + # so they'll stop automatically) + # But give them a moment to flush any remaining output + time.sleep(0.5) + print("API server stopped.") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +title_gen_llm = LLM( + usage_id="title-gen-llm", + model="litellm_proxy/openai/gpt-5-mini", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + +# Use managed API server +with ManagedAPIServer(port=8001) as server: + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, # Disable browser tools for simplicity + ) + + # Define callbacks to test the WebSocket functionality + received_events = [] + event_tracker = {"last_event_time": time.time()} + + def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() + + # Create RemoteConversation with callbacks + # NOTE: Workspace is required for RemoteConversation + workspace = Workspace(host=server.base_url) + result = workspace.execute_command("pwd") + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + + # Send first message and run + logger.info("šŸ“ Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + + # Generate title using a specific LLM + title = conversation.generate_title(max_length=60, llm=title_gen_llm) + logger.info(f"Generated conversation title: {title}") + + logger.info("šŸš€ Running conversation...") + conversation.run() + + logger.info("āœ… First task completed!") + logger.info(f"Agent status: {conversation.state.agent_status}") + + # Wait for events to stop coming (no events for 2 seconds) + logger.info("ā³ Waiting for events to stop...") + while time.time() - event_tracker["last_event_time"] < 2.0: + time.sleep(0.1) + logger.info("āœ… Events have stopped") + + logger.info("šŸš€ Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("āœ… Second task completed!") + + # Demonstrate state.events functionality + logger.info("\n" + "=" * 50) + logger.info("šŸ“Š Demonstrating State Events API") + logger.info("=" * 50) + + # Count total events using state.events + total_events = len(conversation.state.events) + logger.info(f"šŸ“ˆ Total events in conversation: {total_events}") + + # Get recent events (last 5) using state.events + logger.info("\nšŸ” Getting last 5 events using state.events...") + all_events = conversation.state.events + recent_events = all_events[-5:] if len(all_events) >= 5 else all_events + + for i, event in enumerate(recent_events, 1): + event_type = type(event).__name__ + timestamp = getattr(event, "timestamp", "Unknown") + logger.info(f" {i}. {event_type} at {timestamp}") + + # Let's see what the actual event types are + logger.info("\nšŸ” Event types found:") + event_types = set() + for event in recent_events: + event_type = type(event).__name__ + event_types.add(event_type) + for event_type in sorted(event_types): + logger.info(f" - {event_type}") + + # Print all ConversationStateUpdateEvent + logger.info("\nšŸ—‚ļø ConversationStateUpdateEvent events:") + for event in conversation.state.events: + if isinstance(event, ConversationStateUpdateEvent): + logger.info(f" - {event}") + + finally: + # Clean up + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/01_convo_with_local_agent_server.py +``` + +## Key Concepts + +### Managed API Server + +The example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: + +```python highlight={42-61} +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __enter__(self): + """Start the API server subprocess.""" + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) +``` + +The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. + +### Remote Workspace + +When using a remote server, you need to provide a `Workspace` that connects to that server: + +```python highlight={157-158} +workspace = Workspace(host=server.base_url) +result = workspace.execute_command("pwd") +``` + +The `Workspace` object communicates with the remote server's API to execute commands and manage files. + +### RemoteConversation + +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation`: + +```python highlight={164-170} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +RemoteConversation handles communication with the remote agent server over WebSocket for real-time event streaming. + +### Event Callbacks + +Callbacks receive events in real-time as they happen on the remote server: + +```python highlight={148-153} +def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() +``` + +This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. + +### Conversation State + +The conversation state provides access to all events and status: + +```python highlight={208-214} +# Count total events using state.events +total_events = len(conversation.state.events) +logger.info(f"šŸ“ˆ Total events in conversation: {total_events}") + +# Get recent events (last 5) using state.events +all_events = conversation.state.events +recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +``` + +This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. + +## When to Use Local Agent Server + +The local agent server pattern is useful when you want to: + +- **Separate Concerns**: Keep client code separate from agent execution +- **Development**: Test remote conversation features locally +- **Multi-Client**: Allow multiple clients to connect to the same server +- **Debugging**: Inspect server logs and behavior independently +- **Preparation**: Prepare code for deploying to a real remote server + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Connect to hosted API service +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Deep dive into server internals diff --git a/sdk/guides/agent-server/overview.mdx b/sdk/guides/agent-server/overview.mdx new file mode 100644 index 00000000..70632ef5 --- /dev/null +++ b/sdk/guides/agent-server/overview.mdx @@ -0,0 +1,315 @@ +--- +title: Remote Agent Server Overview +description: Run agents on remote servers with isolated workspaces for production deployments. +--- + +The Agent SDK supports both standalone and client-server architectures. Remote Agent Servers enable you to run agents on dedicated infrastructure while keeping client code separate, providing isolation, scalability, and centralized management. + +## What is a Remote Agent Server? + +A Remote Agent Server is an HTTP/WebSocket server that: +- **Runs agents** on dedicated infrastructure +- **Manages workspaces** (Docker containers or remote VMs) +- **Streams events** to clients via WebSocket +- **Handles file operations** (upload, download, editing) +- **Provides isolation** between different agent executions + +Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. + +## Architecture Overview + +```mermaid +graph TD + Client[Client Code] -->|HTTP/WebSocket| Server[Agent Server] + Server --> Workspace[Workspace] + + subgraph Workspace Types + Workspace --> Local[Local Workspace] + Workspace --> Docker[Docker Container] + Workspace --> API[Remote API VM] + end + + Local --> Files[File System] + Docker --> Container[Isolated Container] + API --> Cloud[Cloud Infrastructure] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Workspace fill:#e8f5e8 +``` + +## How Remote Conversations Work + +### 1. Workspace Connection + +When you create a `Conversation` with a remote workspace, it automatically becomes a `RemoteConversation`: + +```python +from openhands.sdk import Conversation +from openhands.workspace import DockerWorkspace + +with DockerWorkspace(server_image="ghcr.io/openhands/agent-server:latest") as workspace: + # Conversation automatically detects remote workspace + conversation = Conversation(agent=agent, workspace=workspace) + # conversation is now a RemoteConversation +``` + +### 2. Agent Server Initialization + +The workspace (DockerWorkspace or APIRemoteWorkspace) automatically: +- Starts an agent server in the workspace +- Waits for the server to be ready +- Provides the server URL to the conversation + +### 3. Event Streaming + +The RemoteConversation establishes a WebSocket connection to stream events: + +```python +def event_callback(event): + print(f"Received: {type(event).__name__}") + +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback] # Callbacks receive real-time events +) +``` + +Events flow: +1. Client sends message via HTTP +2. Agent server executes agent logic +3. Events stream back via WebSocket +4. Client callbacks receive events in real-time + +### 4. File Operations + +File operations go through the agent server API: + +```python +# Upload files to workspace +workspace.upload_file(local_path, remote_path) + +# Download files from workspace +workspace.download_file(remote_path, local_path) + +# Execute commands +result = workspace.execute_command("ls -la") +print(result.stdout) +``` + +## Workspace Types + +The SDK provides three workspace types for different use cases: + +### Local Workspace (Development) + +- **Description**: Direct execution on your machine +- **Setup**: None (default) +- **Use Case**: Quick development and testing +- **Security**: āš ļø Low - full host access + +```python +from openhands.sdk import Conversation + +# Local workspace is implicit +conversation = Conversation(agent=agent) +# Runs directly on your machine +``` + +### Docker Workspace (Self-Hosted) + +- **Description**: Isolated Docker containers +- **Setup**: Docker required +- **Use Case**: Production, testing, multi-tenant +- **Security**: āœ… High - complete isolation + +```python +from openhands.workspace import DockerWorkspace + +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) + # Runs in isolated Docker container +``` + +### API Remote Workspace (Managed) + +- **Description**: Hosted runtime API service +- **Setup**: API key only +- **Use Case**: Zero-ops production +- **Security**: āœ… High - managed isolation + +```python +from openhands.workspace import APIRemoteWorkspace + +with APIRemoteWorkspace( + runtime_api_url="https://runtime.example.com", + runtime_api_key="your-key" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) + # Runs on managed infrastructure +``` + +## Comparison Table + +| Feature | Local | Docker | API Remote | +|---------|-------|--------|------------| +| **Setup** | None | Docker | API key | +| **Isolation** | None | High | High | +| **Infrastructure** | Local | Self-managed | Managed | +| **Scalability** | N/A | Manual | Automatic | +| **Cost** | Free | Infrastructure | Usage-based | +| **Best For** | Development | Control & Testing | Production | + +## When to Use Remote Agent Servers + +### Use Remote Servers When You Need: + +**Isolation** +- Protect host system from agent actions +- Run untrusted or experimental code safely +- Separate different agent workloads + +**Scalability** +- Handle multiple concurrent users +- Auto-scale based on demand +- Distribute workload across servers + +**Production Deployment** +- Centralized agent management +- Monitoring and logging +- Resource control and limits + +**Team Collaboration** +- Multiple developers sharing infrastructure +- Consistent development environments +- Centralized configuration + +### Use Local Workspace When: + +- Rapid development iteration +- Direct access to local files needed +- No isolation requirements +- Single-user development + +## Getting Started + +### Quick Start: Local Agent Server + +Test remote conversation features locally: + +```python +from openhands.sdk import Workspace, Conversation + +# Start local agent server +workspace = Workspace(host="http://localhost:8000") + +conversation = Conversation(agent=agent, workspace=workspace) +conversation.send_message("Hello!") +conversation.run() +``` + +See: [Local Agent Server](/sdk/guides/agent-server/local-agent-server) + +### Production: Docker Sandboxed Server + +Run agents in isolated Docker containers: + +```python +from openhands.workspace import DockerWorkspace + +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Deploy my application") + conversation.run() +``` + +See: [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server) + +### Zero-Ops: API Sandboxed Server + +Use managed infrastructure: + +```python +from openhands.workspace import APIRemoteWorkspace + +with APIRemoteWorkspace( + runtime_api_url="https://runtime.example.com", + runtime_api_key="your-key" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) + conversation.send_message("Analyze this codebase") + conversation.run() +``` + +See: [API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server) + +## Advanced Features + +### Browser Automation + +Enable browser tools in Docker: + +```python +from openhands.workspace import DockerWorkspace + +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + extra_ports=True # Enables VNC access +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=False) # Browser enabled + conversation = Conversation(agent=agent, workspace=workspace) +``` + +See: [Browser with Docker](/sdk/guides/agent-server/browser-with-docker) + +### VS Code Integration + +Access VS Code Web in container: + +```python +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + extra_ports=True # Enables VS Code Web +) as workspace: + # VS Code available at http://localhost:8011 + conversation = Conversation(agent=agent, workspace=workspace) +``` + +See: [VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker) + +## Security Considerations + +### Docker Isolation + +Docker workspaces provide: +- **Process isolation**: Agent cannot affect host processes +- **File system isolation**: Agent cannot access host files +- **Network isolation**: Controlled network access +- **Resource limits**: CPU/memory constraints + +### Best Practices + +1. **Use Pre-built Images**: Faster startup, verified content +2. **Set Resource Limits**: Prevent resource exhaustion +3. **Enable Authentication**: Protect server endpoints +4. **Monitor Activity**: Log and audit agent actions +5. **Update Regularly**: Keep server images current + +## Next Steps + +Explore different deployment options: + +- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development setup +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Self-hosted production +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Managed hosting +- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Web automation +- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Code editing + +For architectural details: +- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Technical deep dive +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Workspace internals \ No newline at end of file diff --git a/sdk/guides/agent-server/vscode-with-docker.mdx b/sdk/guides/agent-server/vscode-with-docker.mdx new file mode 100644 index 00000000..29c329a9 --- /dev/null +++ b/sdk/guides/agent-server/vscode-with-docker.mdx @@ -0,0 +1,292 @@ +--- +title: VS Code with Docker +description: Enable VS Code Web integration for interactive code editing with Docker-sandboxed agents. +--- + +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. + +## Basic Example + + +This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) + + +This example shows how to configure DockerWorkspace with VS Code Web access: + +```python icon="python" expandable examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +import os +import time + +import httpx +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + +# Create a Docker-based remote workspace with extra ports for VSCode access +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=18010, + # TODO: Change this to your platform if not linux/arm64 + platform="linux/arm64", + extra_ports=True, # Expose extra ports for VSCode and VNC + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + """Extra ports allows you to access VSCode at localhost:8011""" + + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + logger.info("šŸ“ Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() + + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" + + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +``` + +## Key Concepts + +### VS Code-Enabled DockerWorkspace + +The workspace is configured with extra ports for VS Code access: + +```python highlight={27-34} +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=18010, + platform="linux/arm64", + extra_ports=True, # Expose extra ports for VSCode and VNC + forward_env=["LLM_API_KEY"], # Forward API key to container +) as workspace: + """Extra ports allows you to access VSCode at localhost:8011""" +``` + +The `extra_ports=True` setting exposes: +- **Port 8011**: VS Code Web interface (host_port + 1) +- **Port 8012**: VNC viewer for visual access + +### VS Code URL Generation + +The example retrieves the VS Code URL with authentication token: + +```python highlight={68-86} +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` + +This generates a properly authenticated URL with the workspace directory pre-opened. + +### Agent Task Execution + +The agent creates files that you can then inspect in VS Code: + +```python highlight={62-65} +logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") +logger.info("šŸ“ Sending first message...") +conversation.send_message("Create a simple Python script that prints Hello World") +conversation.run() +``` + +After the agent completes the task, you can open VS Code to see the generated files. + +### Interactive VS Code Access + +The example waits for user confirmation before exiting: + +```python highlight={88-102} +y = None +while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +This gives you time to explore the workspace in VS Code before the container is cleaned up. + +## When to Use VS Code with Docker + +Use VS Code-enabled Docker workspaces when you need: + +- **Code Inspection**: Review files created by the agent +- **Manual Editing**: Make manual corrections or additions +- **Debugging**: Investigate issues in the workspace +- **Learning**: Understand what the agent is doing +- **Collaboration**: Share workspace access with team members + +## Benefits + +### Development Benefits + +- **Full IDE Experience**: Complete VS Code features in browser +- **No Local Setup**: No need to install VS Code locally +- **Isolated Environment**: All edits happen in container +- **Pre-configured**: Optimized settings for agent workflows + +### VS Code Features Available + +The VS Code Web instance includes: +- **Syntax Highlighting**: For all major languages +- **File Explorer**: Navigate workspace structure +- **Search**: Find text across files +- **Terminal**: Execute commands in container +- **Extensions**: Pre-installed development extensions + +### OpenHands-Optimized Settings + +The VS Code instance comes with: +- **Dark Theme**: Better visibility +- **Auto-save**: Automatic file saving +- **Telemetry Disabled**: Privacy-focused +- **Auto-updates Disabled**: Consistent environment + +## Accessing VS Code + +The VS Code URL format is: + +``` +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` + +**URL Components:** +- `vscode_port`: Usually host_port + 1 (e.g., 8011) +- `tkn`: Authentication token for security +- `folder`: Workspace directory to open + +## Use Cases + +### Code Review + +After the agent generates code: +1. Agent creates files +2. Open VS Code URL in browser +3. Review code structure and quality +4. Make manual adjustments if needed + +### Debugging + +When the agent encounters issues: +1. Agent attempts to solve problem +2. Open VS Code to inspect state +3. Identify the issue manually +4. Guide the agent with additional instructions + +### Learning + +To understand agent behavior: +1. Give agent a complex task +2. Watch it work through the problem +3. Open VS Code to see incremental changes +4. Learn agent's problem-solving approach + +## Next Steps + +- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Add browser capabilities +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Base Docker setup +- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development basics diff --git a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx b/sdk/guides/remote-agent-server/api-sandboxed-server.mdx deleted file mode 100644 index 9f8bef79..00000000 --- a/sdk/guides/remote-agent-server/api-sandboxed-server.mdx +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: API Sandboxed Server -description: Connect to hosted API-based agent server for fully managed infrastructure. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) - - -Connect to a hosted API-based agent server for fully managed infrastructure without running your own server. - -## How to Run - -```bash -export LLM_API_KEY="your-api-key" -export AGENT_SERVER_URL="https://api.openhands.ai" -export AGENT_SERVER_API_KEY="your-server-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation( - agent_server_url="https://api.openhands.ai", - api_key=server_api_key -) -``` - -No server management required - connect to hosted API. - -## Benefits - -- **Zero Ops** - No server management -- **Scalability** - Auto-scaling infrastructure -- **Reliability** - Managed uptime and monitoring - -## Related Documentation - -- [Agent Server Architecture](/sdk/arch/agent_server/overview) -- [Remote Workspace](/sdk/arch/workspace/remote_api) diff --git a/sdk/guides/remote-agent-server/browser-with-docker.mdx b/sdk/guides/remote-agent-server/browser-with-docker.mdx deleted file mode 100644 index a3230976..00000000 --- a/sdk/guides/remote-agent-server/browser-with-docker.mdx +++ /dev/null @@ -1,44 +0,0 @@ ---- -title: Browser with Docker Sandboxed Server -description: Use browser tools with Docker-sandboxed agent server for web automation. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) - - -Combine browser automation capabilities with Docker isolation for secure web interaction. - -## How to Run - -```bash -# Start server with browser support -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest-browser - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation(agent_server_url="http://localhost:8000") -conversation.send_message("Navigate to GitHub and search for OpenHands") -``` - -Browser tools run in isolated Docker container with the agent. - -## Benefits - -- **Secure Browsing** - Isolate web interactions -- **Clean Environment** - Fresh browser state for each session -- **Resource Control** - Limit browser resource usage - -## Related Documentation - -- [Browser Tool](/sdk/arch/tools/browser_use) -- [Docker Workspace](/sdk/arch/workspace/docker) diff --git a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx b/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx deleted file mode 100644 index 8c7967e3..00000000 --- a/sdk/guides/remote-agent-server/docker-sandboxed-server.mdx +++ /dev/null @@ -1,184 +0,0 @@ ---- -title: Docker Workspace & Sandboxed Server -description: Run agents in isolated Docker containers for security and reproducibility. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) - - -Docker workspaces provide complete isolation by running agents in containers. Use for production deployments, testing, and untrusted code execution. - -## DockerWorkspace - -Execute in isolated Docker containers with security boundaries. - -### Direct Usage - -```python -from openhands.workspace import DockerWorkspace -from openhands.sdk import Conversation - -workspace = DockerWorkspace( - working_dir="/workspace", - base_image="python:3.12" -) - -with workspace: - conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Build a web server") - conversation.run() -# Container automatically cleaned up -``` - -See [`01_docker_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_docker_workspace.py) - -### When to Use - -- **Production** - Isolated execution environment -- **Testing** - Clean, reproducible environments -- **Untrusted code** - Run agent in sandbox -- **Multi-user** - Each user gets isolated container - -### Configuration Options - -```python -DockerWorkspace( - working_dir="/workspace", - base_image="ubuntu:22.04", # Build from base image - # OR - server_image="ghcr.io/all-hands-ai/agent-server:latest", # Pre-built image - host_port=None, # Auto-assign port - platform="linux/amd64" # Platform override -) -``` - -### Pre-built Images - -Use pre-built images for faster startup: - -```python -workspace = DockerWorkspace( - working_dir="/workspace", - server_image="ghcr.io/all-hands-ai/agent-server:latest" -) -``` - -No build time - container starts immediately. - -### File Transfer - -Copy files to/from container: - -```python -# Upload file -workspace.upload_file("/local/path/file.txt", "/workspace/file.txt") - -# Download file -workspace.download_file("/workspace/output.txt", "/local/path/output.txt") -``` - -## Docker Sandboxed Server - -Run agent server in Docker and connect remotely. - -### How to Run - -```bash -# Start server in Docker -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py -``` - -### Client Connection - -```python -from openhands.sdk import RemoteConversation - -conversation = RemoteConversation( - agent_server_url="http://localhost:8000", - api_key=api_key -) -conversation.send_message("Your task") -conversation.run() -``` - -## Benefits - -**Security:** -- Complete isolation from host system -- Agent cannot access host files -- Agent cannot affect host processes - -**Resources:** -- Control CPU/memory limits -- Monitor container resource usage -- Kill containers if needed - -**Reproducibility:** -- Consistent environment across deployments -- Version-controlled container images -- Easy rollback to previous versions - -## Docker vs Local Workspace - -| Feature | LocalWorkspace | DockerWorkspace | -|---------|----------------|-----------------| -| **Security** | Low (host access) | High (isolated) | -| **Setup** | None | Docker required | -| **Performance** | Fast | Slight overhead | -| **Cleanup** | Manual | Automatic | -| **Best for** | Development | Production | - -## Best Practices - -### 1. Use Pre-built Images - -```python -# āœ… Good: Fast startup -server_image="ghcr.io/all-hands-ai/agent-server:latest" - -# āŒ Slow: Builds on every run -base_image="python:3.12" -``` - -### 2. Clean Up Containers - -Use context manager for automatic cleanup: - -```python -with workspace: - # Work with workspace - pass -# Container automatically removed -``` - -### 3. Resource Limits - -Set Docker resource limits: - -```bash -docker run --memory="2g" --cpus="1.5" \ - ghcr.io/all-hands-ai/runtime:latest -``` - -### 4. Volume Mounts - -Mount local directories for persistent data: - -```bash -docker run -v /local/data:/workspace/data \ - ghcr.io/all-hands-ai/runtime:latest -``` - -## Related Documentation - -- **[Browser with Docker](/sdk/guides/remote-agent-server/browser-with-docker)** - Browser in container -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details diff --git a/sdk/guides/remote-agent-server/local-agent-server.mdx b/sdk/guides/remote-agent-server/local-agent-server.mdx deleted file mode 100644 index c08c9c8d..00000000 --- a/sdk/guides/remote-agent-server/local-agent-server.mdx +++ /dev/null @@ -1,91 +0,0 @@ ---- -title: Local Agent Server & Workspaces -description: Understand workspaces and run agent server locally for client-server architecture. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) - - -Workspaces define where agents execute commands and access files. This guide introduces workspace concepts and demonstrates the local agent server setup. - -## Workspace Types - -| Type | Security | Setup | Use Case | -|------|----------|-------|----------| -| **LocalWorkspace** | Low (host access) | None | Development | -| **DockerWorkspace** | High (isolated) | Docker | Testing, Production | -| **RemoteAPIWorkspace** | High (isolated) | Server | Multi-user, Cloud | - -## LocalWorkspace - -Execute directly on your machine - default for standalone SDK. - -### Usage - -```python -from openhands.sdk import Conversation - -# LocalWorkspace is implicit (no workspace parameter needed) -conversation = Conversation(agent=agent) -conversation.send_message("Create a Python script") -conversation.run() -``` - -Operations run in current working directory with direct host access. - -### When to Use - -- **Development** - Quick iteration and testing -- **Local files** - Direct access to local filesystem -- **Simple tasks** - No isolation needed - -### Security Considerations - -āš ļø **Warning**: Agent has full host access: -- Can modify any accessible files -- Can execute any commands -- **Not recommended for production or untrusted code** - -## Remote Agent Server - -Run agent server and connect remotely for resource isolation and scalability. - -### How to Run - -```bash -# Terminal 1: Start server -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python -m openhands.agent_server - -# Terminal 2: Run client -export LLM_API_KEY="your-api-key" -uv run python examples/02_remote_agent_server/01_convo_with_local_agent_server.py -``` - -### Client Connection - -```python -from openhands.sdk import RemoteConversation - -conversation = RemoteConversation( - agent_server_url="http://localhost:8000", - api_key=api_key -) -conversation.send_message("Your task") -conversation.run() -``` - -### Benefits - -- **Resource Isolation** - Server handles compute-intensive tasks -- **Scalability** - Multiple clients connect to same server -- **Deployment** - Separate client and execution environments -- **Security** - Isolate agent execution from client - -## Related Documentation - -- **[Docker Sandboxed Server](/sdk/guides/remote-agent-server/docker-sandboxed-server)** - Isolated execution -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Server details -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Technical design diff --git a/sdk/guides/remote-agent-server/vscode-with-docker.mdx b/sdk/guides/remote-agent-server/vscode-with-docker.mdx deleted file mode 100644 index 78aa7598..00000000 --- a/sdk/guides/remote-agent-server/vscode-with-docker.mdx +++ /dev/null @@ -1,43 +0,0 @@ ---- -title: VS Code with Docker Sandboxed Server -description: Enable VS Code integration for code editing with Docker-sandboxed agent. ---- - - -This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) - - -Use VS Code tools with Docker-sandboxed agent server for code editing and development workflows. - -## How to Run - -```bash -# Start server with VS Code support -docker run -p 8000:8000 \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/all-hands-ai/runtime:latest-vscode - -# Run client -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py -``` - -## Key Concept - -```python -conversation = RemoteConversation(agent_server_url="http://localhost:8000") -conversation.send_message("Create a Python Flask app with routes") -``` - -Agent uses VS Code tools for editing, navigation, and refactoring in isolated environment. - -## Benefits - -- **Rich Code Editing** - VS Code features in agent workflows -- **Isolated Development** - Safe code changes in container -- **Full IDE Features** - Syntax highlighting, auto-complete, etc. - -## Related Documentation - -- [Docker Workspace](/sdk/arch/workspace/docker) From 5a283987461587eec8fc6e107858f99a08c5caa5 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 13:03:12 -0400 Subject: [PATCH 27/58] rename --- sdk/guides/agent-server/overview.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/guides/agent-server/overview.mdx b/sdk/guides/agent-server/overview.mdx index 70632ef5..3cf82673 100644 --- a/sdk/guides/agent-server/overview.mdx +++ b/sdk/guides/agent-server/overview.mdx @@ -1,5 +1,5 @@ --- -title: Remote Agent Server Overview +title: Overview description: Run agents on remote servers with isolated workspaces for production deployments. --- @@ -312,4 +312,4 @@ Explore different deployment options: For architectural details: - **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Technical deep dive -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Workspace internals \ No newline at end of file +- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Workspace internals From 6424cbceda5ba18a9c507224ea363953c81ac9c9 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 16:31:29 -0400 Subject: [PATCH 28/58] improve agent-server overview --- sdk/guides/agent-server/overview.mdx | 330 +++++++-------------------- sdk/guides/hello-world.mdx | 2 +- 2 files changed, 88 insertions(+), 244 deletions(-) diff --git a/sdk/guides/agent-server/overview.mdx b/sdk/guides/agent-server/overview.mdx index 3cf82673..6480a4d3 100644 --- a/sdk/guides/agent-server/overview.mdx +++ b/sdk/guides/agent-server/overview.mdx @@ -3,302 +3,146 @@ title: Overview description: Run agents on remote servers with isolated workspaces for production deployments. --- -The Agent SDK supports both standalone and client-server architectures. Remote Agent Servers enable you to run agents on dedicated infrastructure while keeping client code separate, providing isolation, scalability, and centralized management. +Remote Agent Servers package the Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. + + +For example, switching from a local workspace to a Docker‑based remote agent server: + +```python lines +# Local → Docker +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import DockerWorkspace # [!code ++] +with DockerWorkspace( # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + +Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): + +```python lines +# Local → Remote API +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import APIRemoteWorkspace # [!code ++] +with APIRemoteWorkspace( # [!code ++] + runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] + runtime_api_key="YOUR_API_KEY", # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + ## What is a Remote Agent Server? A Remote Agent Server is an HTTP/WebSocket server that: +- **Package the Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) - **Runs agents** on dedicated infrastructure -- **Manages workspaces** (Docker containers or remote VMs) +- **Manages workspaces** (Docker containers or remote sandboxes) - **Streams events** to clients via WebSocket -- **Handles file operations** (upload, download, editing) +- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details - **Provides isolation** between different agent executions Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. +{/* +Same interfaces as local: +[BaseConversation](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[ConversationStateProtocol](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[EventsListBase](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: +[RemoteConversation](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). + */} + + ## Architecture Overview +Remote Agent Servers follow a simple three-part architecture: + ```mermaid graph TD - Client[Client Code] -->|HTTP/WebSocket| Server[Agent Server] + Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] Server --> Workspace[Workspace] - + subgraph Workspace Types - Workspace --> Local[Local Workspace] + Workspace --> Local[Local Folder] Workspace --> Docker[Docker Container] - Workspace --> API[Remote API VM] + Workspace --> API[Remote Sandbox via API] end - + Local --> Files[File System] - Docker --> Container[Isolated Container] + Docker --> Container[Isolated Runtime] API --> Cloud[Cloud Infrastructure] - + style Client fill:#e1f5fe style Server fill:#fff3e0 style Workspace fill:#e8f5e8 ``` -## How Remote Conversations Work - -### 1. Workspace Connection - -When you create a `Conversation` with a remote workspace, it automatically becomes a `RemoteConversation`: - -```python -from openhands.sdk import Conversation -from openhands.workspace import DockerWorkspace - -with DockerWorkspace(server_image="ghcr.io/openhands/agent-server:latest") as workspace: - # Conversation automatically detects remote workspace - conversation = Conversation(agent=agent, workspace=workspace) - # conversation is now a RemoteConversation -``` - -### 2. Agent Server Initialization - -The workspace (DockerWorkspace or APIRemoteWorkspace) automatically: -- Starts an agent server in the workspace -- Waits for the server to be ready -- Provides the server URL to the conversation - -### 3. Event Streaming - -The RemoteConversation establishes a WebSocket connection to stream events: - -```python -def event_callback(event): - print(f"Received: {type(event).__name__}") - -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback] # Callbacks receive real-time events -) -``` - -Events flow: -1. Client sends message via HTTP -2. Agent server executes agent logic -3. Events stream back via WebSocket -4. Client callbacks receive events in real-time - -### 4. File Operations - -File operations go through the agent server API: - -```python -# Upload files to workspace -workspace.upload_file(local_path, remote_path) - -# Download files from workspace -workspace.download_file(remote_path, local_path) - -# Execute commands -result = workspace.execute_command("ls -la") -print(result.stdout) -``` - -## Workspace Types - -The SDK provides three workspace types for different use cases: - -### Local Workspace (Development) - -- **Description**: Direct execution on your machine -- **Setup**: None (default) -- **Use Case**: Quick development and testing -- **Security**: āš ļø Low - full host access - -```python -from openhands.sdk import Conversation - -# Local workspace is implicit -conversation = Conversation(agent=agent) -# Runs directly on your machine -``` - -### Docker Workspace (Self-Hosted) - -- **Description**: Isolated Docker containers -- **Setup**: Docker required -- **Use Case**: Production, testing, multi-tenant -- **Security**: āœ… High - complete isolation - -```python -from openhands.workspace import DockerWorkspace - -with DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python" -) as workspace: - conversation = Conversation(agent=agent, workspace=workspace) - # Runs in isolated Docker container -``` - -### API Remote Workspace (Managed) - -- **Description**: Hosted runtime API service -- **Setup**: API key only -- **Use Case**: Zero-ops production -- **Security**: āœ… High - managed isolation - -```python -from openhands.workspace import APIRemoteWorkspace - -with APIRemoteWorkspace( - runtime_api_url="https://runtime.example.com", - runtime_api_key="your-key" -) as workspace: - conversation = Conversation(agent=agent, workspace=workspace) - # Runs on managed infrastructure -``` - -## Comparison Table - -| Feature | Local | Docker | API Remote | -|---------|-------|--------|------------| -| **Setup** | None | Docker | API key | -| **Isolation** | None | High | High | -| **Infrastructure** | Local | Self-managed | Managed | -| **Scalability** | N/A | Manual | Automatic | -| **Cost** | Free | Infrastructure | Usage-based | -| **Best For** | Development | Control & Testing | Production | - -## When to Use Remote Agent Servers - -### Use Remote Servers When You Need: - -**Isolation** -- Protect host system from agent actions -- Run untrusted or experimental code safely -- Separate different agent workloads - -**Scalability** -- Handle multiple concurrent users -- Auto-scale based on demand -- Distribute workload across servers - -**Production Deployment** -- Centralized agent management -- Monitoring and logging -- Resource control and limits - -**Team Collaboration** -- Multiple developers sharing infrastructure -- Consistent development environments -- Centralized configuration +1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. +2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. +3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. -### Use Local Workspace When: +The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. -- Rapid development iteration -- Direct access to local files needed -- No isolation requirements -- Single-user development - -## Getting Started - -### Quick Start: Local Agent Server - -Test remote conversation features locally: - -```python -from openhands.sdk import Workspace, Conversation - -# Start local agent server -workspace = Workspace(host="http://localhost:8000") - -conversation = Conversation(agent=agent, workspace=workspace) -conversation.send_message("Hello!") -conversation.run() -``` +## How Remote Conversations Work -See: [Local Agent Server](/sdk/guides/agent-server/local-agent-server) +Each step in the diagram maps directly to how the SDK and server interact: -### Production: Docker Sandboxed Server +### 1. Workspace Connection → *(Client → Server)* -Run agents in isolated Docker containers: +When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: ```python -from openhands.workspace import DockerWorkspace - -with DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python" -) as workspace: +with DockerWorkspace(server_image="ghcr.io/openhands/agent-server:latest") as workspace: conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Deploy my application") - conversation.run() ``` -See: [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server) +This turns the local `Conversation` into a **[RemoteConversation](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. -### Zero-Ops: API Sandboxed Server -Use managed infrastructure: +### 2. Server Initialization → *(Server → Workspace)* -```python -from openhands.workspace import APIRemoteWorkspace +Once the workspace starts: +- It launches the agent server process. +- Waits for it to be ready. +- Shares the server URL with the SDK client. -with APIRemoteWorkspace( - runtime_api_url="https://runtime.example.com", - runtime_api_key="your-key" -) as workspace: - conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Analyze this codebase") - conversation.run() -``` - -See: [API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server) +You don’t need to manage this manually—the workspace context handles startup and teardown automatically. -## Advanced Features +### 3. Event Streaming → *(Bidirectional WebSocket)* -### Browser Automation - -Enable browser tools in Docker: +The client and agent server maintain a live WebSocket connection for streaming events: ```python -from openhands.workspace import DockerWorkspace +def on_event(event): + print(f"Received: {type(event).__name__}") -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - extra_ports=True # Enables VNC access -) as workspace: - agent = get_default_agent(llm=llm, cli_mode=False) # Browser enabled - conversation = Conversation(agent=agent, workspace=workspace) +conversation = Conversation(agent=agent, workspace=workspace, callbacks=[on_event]) ``` -See: [Browser with Docker](/sdk/guides/agent-server/browser-with-docker) +This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. -### VS Code Integration +### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* -Access VS Code Web in container: +Workspace supports file and command operations via the agent server API ([base class](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: ```python -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - extra_ports=True # Enables VS Code Web -) as workspace: - # VS Code available at http://localhost:8011 - conversation = Conversation(agent=agent, workspace=workspace) +workspace.file_upload(local_path, remote_path) +workspace.file_download(remote_path, local_path) +result = workspace.execute_command("ls -la") +print(result.stdout) ``` -See: [VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker) - -## Security Considerations - -### Docker Isolation +These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. -Docker workspaces provide: -- **Process isolation**: Agent cannot affect host processes -- **File system isolation**: Agent cannot access host files -- **Network isolation**: Controlled network access -- **Resource limits**: CPU/memory constraints +### Summary -### Best Practices +The architecture makes remote execution seamless: +- Your **client code** stays the same. +- The **agent server** manages execution and streaming. +- The **workspace** provides secure, isolated runtime environments. -1. **Use Pre-built Images**: Faster startup, verified content -2. **Set Resource Limits**: Prevent resource exhaustion -3. **Enable Authentication**: Protect server endpoints -4. **Monitor Activity**: Log and audit agent actions -5. **Update Regularly**: Keep server images current +Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. ## Next Steps @@ -311,5 +155,5 @@ Explore different deployment options: - **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Code editing For architectural details: -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Technical deep dive -- **[Workspace Architecture](/sdk/arch/sdk/workspace)** - Workspace internals +- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Remote execution architecture and deployment +- **[Workspace Package Architecture](/sdk/arch/workspace-package)** - Execution environments and isolation diff --git a/sdk/guides/hello-world.mdx b/sdk/guides/hello-world.mdx index a644dd3f..d37adf50 100644 --- a/sdk/guides/hello-world.mdx +++ b/sdk/guides/hello-world.mdx @@ -66,7 +66,7 @@ Use the preset agent with common built-in tools: agent = get_default_agent(llm=llm, cli_mode=True) ``` -The default agent includes BashTool, FileEditorTool, etc. See [Tools Overview](/sdk/arch/tools/overview) for the complete list of available tools. +The default agent includes BashTool, FileEditorTool, etc. See [Tools Package Architecture](/sdk/arch/tools-package) for the complete list of available tools. ### Conversation Start a conversation to manage the agent's lifecycle: From 64f58d5b663c74ed27f452e9cdabeaff77323762 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 16:37:08 -0400 Subject: [PATCH 29/58] audited local server --- ...ocal-agent-server.mdx => local-server.mdx} | 31 +++++++------------ 1 file changed, 11 insertions(+), 20 deletions(-) rename sdk/guides/agent-server/{local-agent-server.mdx => local-server.mdx} (91%) diff --git a/sdk/guides/agent-server/local-agent-server.mdx b/sdk/guides/agent-server/local-server.mdx similarity index 91% rename from sdk/guides/agent-server/local-agent-server.mdx rename to sdk/guides/agent-server/local-server.mdx index 36420dc6..0c0b3b1c 100644 --- a/sdk/guides/agent-server/local-agent-server.mdx +++ b/sdk/guides/agent-server/local-server.mdx @@ -3,7 +3,7 @@ title: Local Agent Server description: Run agents through a local HTTP server with RemoteConversation for client-server architecture. --- -The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using RemoteConversation. This enables separation between client code and agent execution, making it possible to run agents on dedicated servers or different machines. +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using RemoteConversation. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. ## Basic Example @@ -294,20 +294,21 @@ The server starts with `python -m openhands.agent_server` and automatically hand ### Remote Workspace -When using a remote server, you need to provide a `Workspace` that connects to that server: +When connecting to a remote server, you need to provide a `Workspace` that connects to that server: -```python highlight={157-158} +```python workspace = Workspace(host=server.base_url) result = workspace.execute_command("pwd") ``` +When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). The `Workspace` object communicates with the remote server's API to execute commands and manage files. ### RemoteConversation -When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation`: +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): -```python highlight={164-170} +```python conversation = Conversation( agent=agent, workspace=workspace, @@ -323,7 +324,7 @@ RemoteConversation handles communication with the remote agent server over WebSo Callbacks receive events in real-time as they happen on the remote server: -```python highlight={148-153} +```python def event_callback(event): """Callback to capture events for testing.""" event_type = type(event).__name__ @@ -338,7 +339,7 @@ This enables monitoring agent activity, tracking progress, and implementing cust The conversation state provides access to all events and status: -```python highlight={208-214} +```python # Count total events using state.events total_events = len(conversation.state.events) logger.info(f"šŸ“ˆ Total events in conversation: {total_events}") @@ -350,18 +351,8 @@ recent_events = all_events[-5:] if len(all_events) >= 5 else all_events This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. -## When to Use Local Agent Server - -The local agent server pattern is useful when you want to: - -- **Separate Concerns**: Keep client code separate from agent execution -- **Development**: Test remote conversation features locally -- **Multi-Client**: Allow multiple clients to connect to the same server -- **Debugging**: Inspect server logs and behavior independently -- **Preparation**: Prepare code for deploying to a real remote server - ## Next Steps -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Run server in Docker for isolation -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Connect to hosted API service -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Deep dive into server internals +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Architecture and design decisions From 51f83bc160dcbe5dfaae8dbc68cdace511f3d2a3 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:25:17 -0400 Subject: [PATCH 30/58] improve remote agent server doc --- ...i-sandboxed-server.mdx => api-sandbox.mdx} | 46 +- .../agent-server/browser-with-docker.mdx | 3 +- sdk/guides/agent-server/docker-sandbox.mdx | 683 ++++++++++++++++++ .../agent-server/docker-sandboxed-server.mdx | 264 ------- .../agent-server/vscode-with-docker.mdx | 1 - 5 files changed, 689 insertions(+), 308 deletions(-) rename sdk/guides/agent-server/{api-sandboxed-server.mdx => api-sandbox.mdx} (74%) create mode 100644 sdk/guides/agent-server/docker-sandbox.mdx delete mode 100644 sdk/guides/agent-server/docker-sandboxed-server.mdx diff --git a/sdk/guides/agent-server/api-sandboxed-server.mdx b/sdk/guides/agent-server/api-sandbox.mdx similarity index 74% rename from sdk/guides/agent-server/api-sandboxed-server.mdx rename to sdk/guides/agent-server/api-sandbox.mdx index 8dc6fb5d..650b4ad9 100644 --- a/sdk/guides/agent-server/api-sandboxed-server.mdx +++ b/sdk/guides/agent-server/api-sandbox.mdx @@ -3,7 +3,7 @@ title: API Sandboxed Server description: Connect to hosted API-based agent server for fully managed infrastructure. --- -The API Sandboxed Server demonstrates how to use APIRemoteWorkspace to connect to a hosted runtime API service. This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. +The API Sandboxed Server demonstrates how to use APIRemoteWorkspace to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. ## Basic Example @@ -64,7 +64,7 @@ if not runtime_api_key: with APIRemoteWorkspace( runtime_api_url="https://runtime.eval.all-hands.dev", runtime_api_key=runtime_api_key, - server_image="ghcr.io/openhands/agent-server:latest-python", + server_image="ghcr.io/openhands/agent-server:main-python", ) as workspace: agent = get_default_agent(llm=llm, cli_mode=True) received_events: list = [] @@ -116,7 +116,7 @@ The `APIRemoteWorkspace` connects to a hosted runtime API service: with APIRemoteWorkspace( runtime_api_url="https://runtime.eval.all-hands.dev", runtime_api_key=runtime_api_key, - server_image="ghcr.io/openhands/agent-server:latest-python", + server_image="ghcr.io/openhands/agent-server:main-python", ) as workspace: ``` @@ -147,7 +147,7 @@ You can specify which pre-built agent server image to use: APIRemoteWorkspace( runtime_api_url="https://runtime.eval.all-hands.dev", runtime_api_key=runtime_api_key, - server_image="ghcr.io/openhands/agent-server:latest-python", + server_image="ghcr.io/openhands/agent-server:main-python", ) ``` @@ -179,44 +179,8 @@ assert isinstance(conversation, RemoteConversation) All agent execution happens on the remote runtime infrastructure. -## When to Use API Sandboxed Server - -Use a hosted runtime API when you want: - -- **Zero Infrastructure**: No servers to manage or maintain -- **Automatic Scaling**: Handle varying workloads without configuration -- **Managed Security**: Professionally maintained sandboxed environments -- **Quick Start**: Get running without Docker or server setup -- **Production Ready**: Enterprise-grade reliability and monitoring - -## Benefits - -### Operational Advantages - -- **No DevOps**: Infrastructure is managed for you -- **Always Updated**: Latest security patches and features -- **Cost Efficient**: Pay only for usage -- **Global Distribution**: Low latency from multiple regions - -### Security Advantages - -- **Isolation**: Each execution in a fresh sandboxed container -- **Monitoring**: Built-in threat detection and logging -- **Compliance**: Pre-configured security controls -- **Auditing**: Full audit trails for all operations - -## Comparison with Other Workspace Types - -| Feature | APIRemoteWorkspace | DockerWorkspace | Local Workspace | -|---------|-------------------|-----------------|-----------------| -| **Setup** | API key only | Docker required | None | -| **Infrastructure** | Fully managed | Self-managed | Local only | -| **Isolation** | High | High | None | -| **Scalability** | Automatic | Manual | N/A | -| **Best For** | Production | Control | Development | - ## Next Steps - **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Self-hosted alternative - **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development setup -- **[Agent Server Architecture](/sdk/arch/agent_server/overview)** - Technical details +- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Architecture and design decisions diff --git a/sdk/guides/agent-server/browser-with-docker.mdx b/sdk/guides/agent-server/browser-with-docker.mdx index fc231a58..ac51099e 100644 --- a/sdk/guides/agent-server/browser-with-docker.mdx +++ b/sdk/guides/agent-server/browser-with-docker.mdx @@ -55,7 +55,6 @@ with DockerWorkspace( # TODO: Change this to your platform if not linux/arm64 platform=detect_platform(), extra_ports=True, # Expose extra ports for VSCode and VNC - forward_env=["LLM_API_KEY"], # Forward API key to container ) as workspace: """Extra ports allows you to check localhost:8012 for VNC""" @@ -234,4 +233,4 @@ This is particularly useful for: - **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Enable VS Code integration - **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Base Docker setup -- **[Browser Tool Architecture](/sdk/arch/tools/browser_use)** - Technical details +- **[Tools Package Architecture](/sdk/arch/tools-package)** - Built-in tools including BrowserUseTool diff --git a/sdk/guides/agent-server/docker-sandbox.mdx b/sdk/guides/agent-server/docker-sandbox.mdx new file mode 100644 index 00000000..7570ba7f --- /dev/null +++ b/sdk/guides/agent-server/docker-sandbox.mdx @@ -0,0 +1,683 @@ +--- +title: Docker Sandbox +description: Run agent server in isolated Docker containers for security and reproducibility. +--- + +The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using DockerWorkspace. + +This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. + +the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. + +## 1) Basic Docker Sandbox + + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically +with DockerWorkspace( + # dynamically build agent-server image + # base_image="nikolaik/python-nodejs:python3.12-nodejs22", + # use pre-built image for faster startup + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + + logger.info("šŸ“ Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("šŸš€ Running conversation...") + conversation.run() + logger.info("āœ… First task completed!") + logger.info(f"Agent status: {conversation.state.agent_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("ā³ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("āœ… Events have stopped") + + logger.info("šŸš€ Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("āœ… Second task completed!") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +``` + +### Key Concepts + +#### DockerWorkspace Context Manager + +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: + +```python highlight={42-50} +with DockerWorkspace( + # dynamically build agent-server image + # base_image="nikolaik/python-nodejs:python3.12-nodejs22", + # use pre-built image for faster startup + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` + +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done + +#### Platform Detection + +The example includes platform detection to ensure the correct Docker image is built and used: + +```python highlight={32-37} +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` + +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). + + +#### Testing the Workspace + +Before creating a conversation, the example tests the workspace connection: + +```python highlight={68-74} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` + +This verifies the workspace is properly initialized and can execute commands. + +#### Automatic RemoteConversation + +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: + +```python highlight={75-81} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. + + +#### Pre-built vs Base Images + +```python +# āœ… Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) + +# ā±ļø Slower: Build on the fly from base image (more control) +DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) +``` + +Pre-built images start immediately, while base images need to build the agent server first. + +--- + +## 2) VS Code in Docker Sandbox + + +This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) + + +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. + +```python icon="python" expandable examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +import os +import time + +import httpx +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + +# Create a Docker-based remote workspace with extra ports for VSCode access +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=18010, + # TODO: Change this to your platform if not linux/arm64 + platform="linux/arm64", + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:8011""" + + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + logger.info("šŸ“ Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() + + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" + + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py +``` + +### Key Concepts + +#### VS Code-Enabled DockerWorkspace + +The workspace is configured with extra ports for VS Code access: + +```python highlight={27-34} +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=18010, + platform="linux/arm64", + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:8011""" +``` + +The `extra_ports=True` setting exposes: +- Port `host_port+1`: VS Code Web interface (host_port + 1) +- Port `host_port+2`: VNC viewer for visual access + + +#### VS Code URL Generation + +The example retrieves the VS Code URL with authentication token: + +```python highlight={68-86} +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` + +This generates a properly authenticated URL with the workspace directory pre-opened. + +#### Agent Task Execution + +The agent creates files that you can then inspect in VS Code: + +```python highlight={62-65} +logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") +logger.info("šŸ“ Sending first message...") +conversation.send_message("Create a simple Python script that prints Hello World") +conversation.run() +``` + +After the agent completes the task, you can open VS Code to see the generated files. + +#### Interactive VS Code Access + +The example waits for user confirmation before exiting: + +```python highlight={88-102} +y = None +while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +This gives you time to explore the workspace in VS Code before the container is cleaned up. + +#### When to Use VS Code with Docker + +Use VS Code-enabled Docker workspaces when you need: +- Code Inspection: Review files created by the agent +- Manual Editing: Make manual corrections or additions +- Debugging: Investigate issues in the workspace +- Learning: Understand what the agent is doing +- Collaboration: Share workspace access with team members + +#### Benefits + +- Full IDE Experience: Complete VS Code features in browser +- No Local Setup: No need to install VS Code locally +- Isolated Environment: All edits happen in container +- Pre-configured: Optimized settings for agent workflows + +#### VS Code URL Format + +``` +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` + +- vscode_port: Usually host_port + 1 (e.g., 8011) +- tkn: Authentication token for security +- folder: Workspace directory to open + +--- + +## 3) Browser in Docker Sandbox + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. + +This example shows how to configure DockerWorkspace with browser capabilities and VNC access: + +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# Create a Docker-based remote workspace with extra ports for browser access +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + # TODO: Change this to your platform if not linux/arm64 + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" + + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") + logger.info("šŸ“ Sending first message...") + conversation.send_message( + "Could you go to https://all-hands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() + + # Wait for user confirm to exit + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +```bash Running the Example +export LLM_API_KEY="your-api-key" +cd agent-sdk +uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +``` + +### Key Concepts + +#### Browser-Enabled DockerWorkspace + +The workspace is configured with extra ports for browser access: + +```python highlight={36-43} +with DockerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` + +The `extra_ports=True` setting exposes additional ports for: +- Port 8011: VS Code Web interface +- Port 8012: VNC viewer for browser visualization + +#### Enabling Browser Tools + +Browser tools are enabled by setting `cli_mode=False`: + +```python highlight={46-50} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` + +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. + +#### Sending Browser Tasks + +The agent can perform web automation tasks: + +```python highlight={72-76} +logger.info("šŸ“ Sending first message...") +conversation.send_message( + "Could you go to https://all-hands.dev/ blog page and summarize main " + "points of the latest blog?" +) +``` + +The agent will use browser tools to navigate to the URL, read the content, and provide a summary. + +#### Visual Browser Access + +With `extra_ports=True`, you can watch the browser in real-time via VNC: + +```python highlight={80-89} +y = None +while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + +This allows you to: +- See exactly what the agent is doing in the browser +- Debug browser automation issues +- Understand agent behavior visually + +Demo video: + + +#### When to Use Browser with Docker + +Use browser-enabled Docker workspaces when you need: +- Web Scraping: Extract data from websites safely +- Web Automation: Automate web-based workflows +- Testing: Test web applications in isolated environments +- Visual Monitoring: Watch agent interactions in real-time +- Security: Isolate web browsing from your host system + +#### Benefits + +- Isolation: Browser runs in container, not on your host +- Clean State: Fresh browser for each run +- No Pollution: No cookies, cache, or history on your machine +- Visual Debugging: Watch the browser via VNC +- Reproducible: Same environment every time +- Easy Cleanup: Container removal clears everything +- Production: Control CPU/memory, run concurrent sessions, monitor usage + +#### VNC Access + +The VNC interface provides real-time visual access to the browser: + +``` +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` + +- autoconnect=1: Automatically connect to VNC server +- resize=remote: Automatically adjust resolution + +--- + +## Next Steps + +- API Sandboxed Server: /sdk/guides/agent-server/api-sandbox +- Local Agent Server: /sdk/guides/agent-server/local-server +- Agent Server Overview: /sdk/guides/agent-server/overview diff --git a/sdk/guides/agent-server/docker-sandboxed-server.mdx b/sdk/guides/agent-server/docker-sandboxed-server.mdx deleted file mode 100644 index 05870971..00000000 --- a/sdk/guides/agent-server/docker-sandboxed-server.mdx +++ /dev/null @@ -1,264 +0,0 @@ ---- -title: Docker Sandboxed Server -description: Run agents in isolated Docker containers for security and reproducibility. ---- - -The Docker Sandboxed Server demonstrates how to run agents in isolated Docker containers using DockerWorkspace. This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. - -## Basic Example - - -This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) - - -This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: - -```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py -import os -import platform -import time - -from pydantic import SecretStr - -from openhands.sdk import ( - LLM, - Conversation, - RemoteConversation, - get_logger, -) -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace - - -logger = get_logger(__name__) - - -# 1) Ensure we have LLM API key -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." - -llm = LLM( - usage_id="agent", - model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) - - -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" - - -# 2) Create a Docker-based remote workspace that will set up and manage -# the Docker container automatically -with DockerWorkspace( - # dynamically build agent-server image - # base_image="nikolaik/python-nodejs:python3.12-nodejs22", - # use pre-built image for faster startup - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, - platform=detect_platform(), - forward_env=["LLM_API_KEY"], # Forward API key to container -) as workspace: - # 3) Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) - - # 4) Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} - - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() - - # 5) Test the workspace with a simple command - result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" - ) - logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" - ) - logger.info(f"Output: {result.stdout}") - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, - ) - assert isinstance(conversation, RemoteConversation) - - try: - logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") - - logger.info("šŸ“ Sending first message...") - conversation.send_message( - "Read the current repo and write 3 facts about the project into FACTS.txt." - ) - logger.info("šŸš€ Running conversation...") - conversation.run() - logger.info("āœ… First task completed!") - logger.info(f"Agent status: {conversation.state.agent_status}") - - # Wait for events to settle (no events for 2 seconds) - logger.info("ā³ Waiting for events to stop...") - while time.time() - last_event_time["ts"] < 2.0: - time.sleep(0.1) - logger.info("āœ… Events have stopped") - - logger.info("šŸš€ Running conversation again...") - conversation.send_message("Great! Now delete that file.") - conversation.run() - logger.info("āœ… Second task completed!") - finally: - print("\n🧹 Cleaning up conversation...") - conversation.close() -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py -``` - -## Key Concepts - -### DockerWorkspace Context Manager - -The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: - -```python highlight={42-50} -with DockerWorkspace( - # use pre-built image for faster startup - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, - platform=detect_platform(), - forward_env=["LLM_API_KEY"], # Forward API key to container -) as workspace: - # Container is running here - # Work with the workspace - pass -# Container is automatically stopped and cleaned up here -``` - -The workspace automatically: -- Pulls or builds the Docker image -- Starts the container with an agent server -- Waits for the server to be ready -- Cleans up the container when done - -### Platform Detection - -The example includes platform detection to ensure the correct Docker image is used: - -```python highlight={32-37} -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" -``` - -This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). - -### Environment Forwarding - -You can forward environment variables from your host to the container: - -```python highlight={49} -DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, - platform=detect_platform(), - forward_env=["LLM_API_KEY"], # Forward API key to container -) -``` - -This allows the agent running inside the container to access necessary credentials. - -### Testing the Workspace - -Before creating a conversation, the example tests the workspace connection: - -```python highlight={68-74} -result = workspace.execute_command( - "echo 'Hello from sandboxed environment!' && pwd" -) -logger.info( - f"Command '{result.command}' completed with exit code {result.exit_code}" -) -logger.info(f"Output: {result.stdout}") -``` - -This verifies the workspace is properly initialized and can execute commands. - -### Automatic RemoteConversation - -When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: - -```python highlight={75-81} -conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, -) -assert isinstance(conversation, RemoteConversation) -``` - -The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. - -## When to Use Docker Sandboxed Server - -Use Docker containers when you need: - -- **Security**: Complete isolation from host system -- **Production**: Deploy agents in controlled environments -- **Testing**: Clean, reproducible test environments -- **Multi-tenant**: Isolate different users or workloads -- **Resource Control**: Set CPU/memory limits per container - -## Configuration Options - -### Pre-built vs Base Images - -```python -# āœ… Fast: Use pre-built image (recommended) -DockerWorkspace( - server_image="ghcr.io/openhands/agent-server:latest-python", - host_port=8010, -) - -# ā±ļø Slower: Build from base image (more control) -DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=8010, -) -``` - -Pre-built images start immediately, while base images need to build the agent server first. - -### Resource Limits - -When running Docker containers, you can set resource limits: - -```bash -docker run --memory="2g" --cpus="1.5" \ - -e LLM_API_KEY="your-api-key" \ - ghcr.io/openhands/agent-server:latest-python -``` - -## Next Steps - -- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Add browser capabilities -- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Enable VS Code tools -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Use managed hosting diff --git a/sdk/guides/agent-server/vscode-with-docker.mdx b/sdk/guides/agent-server/vscode-with-docker.mdx index 29c329a9..1b93ea89 100644 --- a/sdk/guides/agent-server/vscode-with-docker.mdx +++ b/sdk/guides/agent-server/vscode-with-docker.mdx @@ -46,7 +46,6 @@ with DockerWorkspace( # TODO: Change this to your platform if not linux/arm64 platform="linux/arm64", extra_ports=True, # Expose extra ports for VSCode and VNC - forward_env=["LLM_API_KEY"], # Forward API key to container ) as workspace: """Extra ports allows you to access VSCode at localhost:8011""" From d3b268740d64f2475fa688c151aab4edf259806e Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:25:26 -0400 Subject: [PATCH 31/58] remove --- .../agent-server/browser-with-docker.mdx | 236 -------------- .../agent-server/vscode-with-docker.mdx | 291 ------------------ 2 files changed, 527 deletions(-) delete mode 100644 sdk/guides/agent-server/browser-with-docker.mdx delete mode 100644 sdk/guides/agent-server/vscode-with-docker.mdx diff --git a/sdk/guides/agent-server/browser-with-docker.mdx b/sdk/guides/agent-server/browser-with-docker.mdx deleted file mode 100644 index ac51099e..00000000 --- a/sdk/guides/agent-server/browser-with-docker.mdx +++ /dev/null @@ -1,236 +0,0 @@ ---- -title: Browser with Docker -description: Enable browser automation with Docker-sandboxed agents for secure web interaction. ---- - -Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. - -## Basic Example - - -This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) - - -This example shows how to configure DockerWorkspace with browser capabilities and VNC access: - -```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py -import os -import platform -import time - -from pydantic import SecretStr - -from openhands.sdk import LLM, Conversation, get_logger -from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace - - -logger = get_logger(__name__) - - -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." - -llm = LLM( - usage_id="agent", - model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) - - -def detect_platform(): - """Detects the correct Docker platform string.""" - machine = platform.machine().lower() - if "arm" in machine or "aarch64" in machine: - return "linux/arm64" - return "linux/amd64" - - -# Create a Docker-based remote workspace with extra ports for browser access -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=8010, - # TODO: Change this to your platform if not linux/arm64 - platform=detect_platform(), - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to check localhost:8012 for VNC""" - - # Create agent with browser tools enabled - agent = get_default_agent( - llm=llm, - cli_mode=False, # CLI mode = False will enable browser tools - ) - - # Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} - - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() - - # Create RemoteConversation using the workspace - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, - ) - assert isinstance(conversation, RemoteConversation) - - logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") - logger.info("šŸ“ Sending first message...") - conversation.send_message( - "Could you go to https://all-hands.dev/ blog page and summarize main " - "points of the latest blog?" - ) - conversation.run() - - # Wait for user confirm to exit - y = None - while y != "y": - y = input( - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open a browser tab to see the *actual* browser OpenHands " - "is interacting with via VNC.\n\n" - "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py -``` - -## Key Concepts - -### Browser-Enabled DockerWorkspace - -The workspace is configured with extra ports for browser access: - -```python highlight={36-43} -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=8010, - platform=detect_platform(), - extra_ports=True, # Expose extra ports for VSCode and VNC - forward_env=["LLM_API_KEY"], # Forward API key to container -) as workspace: - """Extra ports allows you to check localhost:8012 for VNC""" -``` - -The `extra_ports=True` setting exposes additional ports for: -- **Port 8011**: VS Code Web interface -- **Port 8012**: VNC viewer for browser visualization - -### Enabling Browser Tools - -Browser tools are enabled by setting `cli_mode=False`: - -```python highlight={46-50} -# Create agent with browser tools enabled -agent = get_default_agent( - llm=llm, - cli_mode=False, # CLI mode = False will enable browser tools -) -``` - -When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. - -### Sending Browser Tasks - -The agent can perform web automation tasks: - -```python highlight={72-76} -logger.info("šŸ“ Sending first message...") -conversation.send_message( - "Could you go to https://all-hands.dev/ blog page and summarize main " - "points of the latest blog?" -) -``` - -The agent will use browser tools to navigate to the URL, read the content, and provide a summary. - -### Visual Browser Access - -With `extra_ports=True`, you can watch the browser in real-time via VNC: - -```python highlight={80-89} -y = None -while y != "y": - y = input( - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open a browser tab to see the *actual* browser OpenHands " - "is interacting with via VNC.\n\n" - "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -This allows you to: -- See exactly what the agent is doing in the browser -- Debug browser automation issues -- Understand agent behavior visually - -## When to Use Browser with Docker - -Use browser-enabled Docker workspaces when you need: - -- **Web Scraping**: Extract data from websites safely -- **Web Automation**: Automate web-based workflows -- **Testing**: Test web applications in isolated environments -- **Visual Monitoring**: Watch agent interactions in real-time -- **Security**: Isolate web browsing from your host system - -## Benefits - -### Isolation Benefits - -- **Secure**: Browser runs in container, not on your host -- **Clean State**: Fresh browser for each run -- **No Pollution**: No cookies, cache, or history on your machine - -### Development Benefits - -- **Visual Debugging**: Watch the browser via VNC -- **Reproducible**: Same environment every time -- **Easy Cleanup**: Container removal clears everything - -### Production Benefits - -- **Resource Control**: Limit CPU/memory for browser -- **Concurrent Sessions**: Run multiple isolated browsers -- **Monitoring**: Track browser resource usage - -## VNC Access - -The VNC interface provides real-time visual access to the browser: - -``` -http://localhost:8012/vnc.html?autoconnect=1&resize=remote -``` - -**URL Parameters:** -- `autoconnect=1`: Automatically connect to VNC server -- `resize=remote`: Automatically adjust resolution - -This is particularly useful for: -- Debugging navigation issues -- Verifying visual elements -- Understanding agent decision-making -- Demonstrating agent capabilities - -## Next Steps - -- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Enable VS Code integration -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Base Docker setup -- **[Tools Package Architecture](/sdk/arch/tools-package)** - Built-in tools including BrowserUseTool diff --git a/sdk/guides/agent-server/vscode-with-docker.mdx b/sdk/guides/agent-server/vscode-with-docker.mdx deleted file mode 100644 index 1b93ea89..00000000 --- a/sdk/guides/agent-server/vscode-with-docker.mdx +++ /dev/null @@ -1,291 +0,0 @@ ---- -title: VS Code with Docker -description: Enable VS Code Web integration for interactive code editing with Docker-sandboxed agents. ---- - -VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. - -## Basic Example - - -This example is available on GitHub: [examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/agent-sdk/blob/main/examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py) - - -This example shows how to configure DockerWorkspace with VS Code Web access: - -```python icon="python" expandable examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py -import os -import time - -import httpx -from pydantic import SecretStr - -from openhands.sdk import LLM, Conversation, get_logger -from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation -from openhands.tools.preset.default import get_default_agent -from openhands.workspace import DockerWorkspace - - -logger = get_logger(__name__) - - -api_key = os.getenv("LLM_API_KEY") -assert api_key is not None, "LLM_API_KEY environment variable is not set." - -llm = LLM( - usage_id="agent", - model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", - api_key=SecretStr(api_key), -) - -# Create a Docker-based remote workspace with extra ports for VSCode access -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=18010, - # TODO: Change this to your platform if not linux/arm64 - platform="linux/arm64", - extra_ports=True, # Expose extra ports for VSCode and VNC -) as workspace: - """Extra ports allows you to access VSCode at localhost:8011""" - - # Create agent - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) - - # Set up callback collection - received_events: list = [] - last_event_time = {"ts": time.time()} - - def event_callback(event) -> None: - event_type = type(event).__name__ - logger.info(f"šŸ”” Callback received event: {event_type}\n{event}") - received_events.append(event) - last_event_time["ts"] = time.time() - - # Create RemoteConversation using the workspace - conversation = Conversation( - agent=agent, - workspace=workspace, - callbacks=[event_callback], - visualize=True, - ) - assert isinstance(conversation, RemoteConversation) - - logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") - logger.info("šŸ“ Sending first message...") - conversation.send_message("Create a simple Python script that prints Hello World") - conversation.run() - - # Get VSCode URL with token - vscode_port = (workspace.host_port or 8010) + 1 - try: - response = httpx.get( - f"{workspace.host}/api/vscode/url", - params={"workspace_dir": workspace.working_dir}, - ) - vscode_data = response.json() - vscode_url = vscode_data.get("url", "").replace( - "localhost:8001", f"localhost:{vscode_port}" - ) - except Exception: - # Fallback if server route not available - folder = ( - f"/{workspace.working_dir}" - if not str(workspace.working_dir).startswith("/") - else str(workspace.working_dir) - ) - vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" - - # Wait for user to explore VSCode - y = None - while y != "y": - y = input( - "\n" - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open VSCode Web to see the workspace.\n\n" - f"VSCode URL: {vscode_url}\n\n" - "The VSCode should have the OpenHands settings extension installed:\n" - " - Dark theme enabled\n" - " - Auto-save enabled\n" - " - Telemetry disabled\n" - " - Auto-updates disabled\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -```bash Running the Example -export LLM_API_KEY="your-api-key" -cd agent-sdk -uv run python examples/02_remote_agent_server/04_vscode_with_docker_sandboxed_server.py -``` - -## Key Concepts - -### VS Code-Enabled DockerWorkspace - -The workspace is configured with extra ports for VS Code access: - -```python highlight={27-34} -with DockerWorkspace( - base_image="nikolaik/python-nodejs:python3.12-nodejs22", - host_port=18010, - platform="linux/arm64", - extra_ports=True, # Expose extra ports for VSCode and VNC - forward_env=["LLM_API_KEY"], # Forward API key to container -) as workspace: - """Extra ports allows you to access VSCode at localhost:8011""" -``` - -The `extra_ports=True` setting exposes: -- **Port 8011**: VS Code Web interface (host_port + 1) -- **Port 8012**: VNC viewer for visual access - -### VS Code URL Generation - -The example retrieves the VS Code URL with authentication token: - -```python highlight={68-86} -# Get VSCode URL with token -vscode_port = (workspace.host_port or 8010) + 1 -try: - response = httpx.get( - f"{workspace.host}/api/vscode/url", - params={"workspace_dir": workspace.working_dir}, - ) - vscode_data = response.json() - vscode_url = vscode_data.get("url", "").replace( - "localhost:8001", f"localhost:{vscode_port}" - ) -except Exception: - # Fallback if server route not available - folder = ( - f"/{workspace.working_dir}" - if not str(workspace.working_dir).startswith("/") - else str(workspace.working_dir) - ) - vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" -``` - -This generates a properly authenticated URL with the workspace directory pre-opened. - -### Agent Task Execution - -The agent creates files that you can then inspect in VS Code: - -```python highlight={62-65} -logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") -logger.info("šŸ“ Sending first message...") -conversation.send_message("Create a simple Python script that prints Hello World") -conversation.run() -``` - -After the agent completes the task, you can open VS Code to see the generated files. - -### Interactive VS Code Access - -The example waits for user confirmation before exiting: - -```python highlight={88-102} -y = None -while y != "y": - y = input( - "\n" - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open VSCode Web to see the workspace.\n\n" - f"VSCode URL: {vscode_url}\n\n" - "The VSCode should have the OpenHands settings extension installed:\n" - " - Dark theme enabled\n" - " - Auto-save enabled\n" - " - Telemetry disabled\n" - " - Auto-updates disabled\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -This gives you time to explore the workspace in VS Code before the container is cleaned up. - -## When to Use VS Code with Docker - -Use VS Code-enabled Docker workspaces when you need: - -- **Code Inspection**: Review files created by the agent -- **Manual Editing**: Make manual corrections or additions -- **Debugging**: Investigate issues in the workspace -- **Learning**: Understand what the agent is doing -- **Collaboration**: Share workspace access with team members - -## Benefits - -### Development Benefits - -- **Full IDE Experience**: Complete VS Code features in browser -- **No Local Setup**: No need to install VS Code locally -- **Isolated Environment**: All edits happen in container -- **Pre-configured**: Optimized settings for agent workflows - -### VS Code Features Available - -The VS Code Web instance includes: -- **Syntax Highlighting**: For all major languages -- **File Explorer**: Navigate workspace structure -- **Search**: Find text across files -- **Terminal**: Execute commands in container -- **Extensions**: Pre-installed development extensions - -### OpenHands-Optimized Settings - -The VS Code instance comes with: -- **Dark Theme**: Better visibility -- **Auto-save**: Automatic file saving -- **Telemetry Disabled**: Privacy-focused -- **Auto-updates Disabled**: Consistent environment - -## Accessing VS Code - -The VS Code URL format is: - -``` -http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} -``` - -**URL Components:** -- `vscode_port`: Usually host_port + 1 (e.g., 8011) -- `tkn`: Authentication token for security -- `folder`: Workspace directory to open - -## Use Cases - -### Code Review - -After the agent generates code: -1. Agent creates files -2. Open VS Code URL in browser -3. Review code structure and quality -4. Make manual adjustments if needed - -### Debugging - -When the agent encounters issues: -1. Agent attempts to solve problem -2. Open VS Code to inspect state -3. Identify the issue manually -4. Guide the agent with additional instructions - -### Learning - -To understand agent behavior: -1. Give agent a complex task -2. Watch it work through the problem -3. Open VS Code to see incremental changes -4. Learn agent's problem-solving approach - -## Next Steps - -- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Add browser capabilities -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Base Docker setup -- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development basics From f61e250a2ee23cba0653a231e3a8f65e15380229 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:26:19 -0400 Subject: [PATCH 32/58] tweak --- sdk/guides/agent-server/api-sandbox.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/guides/agent-server/api-sandbox.mdx b/sdk/guides/agent-server/api-sandbox.mdx index 650b4ad9..994f15aa 100644 --- a/sdk/guides/agent-server/api-sandbox.mdx +++ b/sdk/guides/agent-server/api-sandbox.mdx @@ -1,9 +1,9 @@ --- -title: API Sandboxed Server +title: API-based Sandbox description: Connect to hosted API-based agent server for fully managed infrastructure. --- -The API Sandboxed Server demonstrates how to use APIRemoteWorkspace to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. +The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. ## Basic Example From 029ded248848071f3ba47838266e8b31c3b4c81f Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:32:10 -0400 Subject: [PATCH 33/58] simplify docker sandbox documentation --- sdk/guides/agent-server/docker-sandbox.mdx | 116 ++------------------- 1 file changed, 6 insertions(+), 110 deletions(-) diff --git a/sdk/guides/agent-server/docker-sandbox.mdx b/sdk/guides/agent-server/docker-sandbox.mdx index 7570ba7f..4a4d3469 100644 --- a/sdk/guides/agent-server/docker-sandbox.mdx +++ b/sdk/guides/agent-server/docker-sandbox.mdx @@ -395,58 +395,7 @@ except Exception: This generates a properly authenticated URL with the workspace directory pre-opened. -#### Agent Task Execution - -The agent creates files that you can then inspect in VS Code: - -```python highlight={62-65} -logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") -logger.info("šŸ“ Sending first message...") -conversation.send_message("Create a simple Python script that prints Hello World") -conversation.run() -``` - -After the agent completes the task, you can open VS Code to see the generated files. - -#### Interactive VS Code Access - -The example waits for user confirmation before exiting: - -```python highlight={88-102} -y = None -while y != "y": - y = input( - "\n" - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open VSCode Web to see the workspace.\n\n" - f"VSCode URL: {vscode_url}\n\n" - "The VSCode should have the OpenHands settings extension installed:\n" - " - Dark theme enabled\n" - " - Auto-save enabled\n" - " - Telemetry disabled\n" - " - Auto-updates disabled\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -This gives you time to explore the workspace in VS Code before the container is cleaned up. - -#### When to Use VS Code with Docker - -Use VS Code-enabled Docker workspaces when you need: -- Code Inspection: Review files created by the agent -- Manual Editing: Make manual corrections or additions -- Debugging: Investigate issues in the workspace -- Learning: Understand what the agent is doing -- Collaboration: Share workspace access with team members - -#### Benefits - -- Full IDE Experience: Complete VS Code features in browser -- No Local Setup: No need to install VS Code locally -- Isolated Environment: All edits happen in container -- Pre-configured: Optimized settings for agent workflows +Read the API Reference [here](/sdk/guides/agent-server/api-reference/vscode/get-vscode-url) for more information. #### VS Code URL Format @@ -584,8 +533,8 @@ with DockerWorkspace( ``` The `extra_ports=True` setting exposes additional ports for: -- Port 8011: VS Code Web interface -- Port 8012: VNC viewer for browser visualization +- Port `host_port+1`: VS Code Web interface +- Port `host_port+2`: VNC viewer for browser visualization #### Enabling Browser Tools @@ -601,68 +550,13 @@ agent = get_default_agent( When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. -#### Sending Browser Tasks - -The agent can perform web automation tasks: - -```python highlight={72-76} -logger.info("šŸ“ Sending first message...") -conversation.send_message( - "Could you go to https://all-hands.dev/ blog page and summarize main " - "points of the latest blog?" -) -``` - -The agent will use browser tools to navigate to the URL, read the content, and provide a summary. - -#### Visual Browser Access - -With `extra_ports=True`, you can watch the browser in real-time via VNC: - -```python highlight={80-89} -y = None -while y != "y": - y = input( - "Because you've enabled extra_ports=True in DockerWorkspace, " - "you can open a browser tab to see the *actual* browser OpenHands " - "is interacting with via VNC.\n\n" - "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" - "Press 'y' and Enter to exit and terminate the workspace.\n" - ">> " - ) -``` - -This allows you to: -- See exactly what the agent is doing in the browser -- Debug browser automation issues -- Understand agent behavior visually - -Demo video: +When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: -#### When to Use Browser with Docker - -Use browser-enabled Docker workspaces when you need: -- Web Scraping: Extract data from websites safely -- Web Automation: Automate web-based workflows -- Testing: Test web applications in isolated environments -- Visual Monitoring: Watch agent interactions in real-time -- Security: Isolate web browsing from your host system - -#### Benefits - -- Isolation: Browser runs in container, not on your host -- Clean State: Fresh browser for each run -- No Pollution: No cookies, cache, or history on your machine -- Visual Debugging: Watch the browser via VNC -- Reproducible: Same environment every time -- Easy Cleanup: Container removal clears everything -- Production: Control CPU/memory, run concurrent sessions, monitor usage - #### VNC Access The VNC interface provides real-time visual access to the browser: @@ -674,6 +568,8 @@ http://localhost:8012/vnc.html?autoconnect=1&resize=remote - autoconnect=1: Automatically connect to VNC server - resize=remote: Automatically adjust resolution +Read the API Reference [here](/sdk/guides/agent-server/api-reference/desktop/get-desktop-url) for more information. + --- ## Next Steps From db6de5249f1b9ae2334ebff06101ebcaf0331834 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:36:31 -0400 Subject: [PATCH 34/58] tweaks --- sdk/guides/agent-server/api-sandbox.mdx | 6 +++--- sdk/guides/agent-server/docker-sandbox.mdx | 7 ++++--- sdk/guides/agent-server/overview.mdx | 8 +++----- sdk/guides/custom-tools.mdx | 7 ++++--- sdk/guides/mcp.mdx | 2 +- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/sdk/guides/agent-server/api-sandbox.mdx b/sdk/guides/agent-server/api-sandbox.mdx index 994f15aa..c7c3cfca 100644 --- a/sdk/guides/agent-server/api-sandbox.mdx +++ b/sdk/guides/agent-server/api-sandbox.mdx @@ -181,6 +181,6 @@ All agent execution happens on the remote runtime infrastructure. ## Next Steps -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Self-hosted alternative -- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development setup -- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Architecture and design decisions +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** +- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** +- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** diff --git a/sdk/guides/agent-server/docker-sandbox.mdx b/sdk/guides/agent-server/docker-sandbox.mdx index 4a4d3469..e07b44da 100644 --- a/sdk/guides/agent-server/docker-sandbox.mdx +++ b/sdk/guides/agent-server/docker-sandbox.mdx @@ -574,6 +574,7 @@ Read the API Reference [here](/sdk/guides/agent-server/api-reference/desktop/get ## Next Steps -- API Sandboxed Server: /sdk/guides/agent-server/api-sandbox -- Local Agent Server: /sdk/guides/agent-server/local-server -- Agent Server Overview: /sdk/guides/agent-server/overview +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Architecture and design decisions diff --git a/sdk/guides/agent-server/overview.mdx b/sdk/guides/agent-server/overview.mdx index 6480a4d3..87525cdc 100644 --- a/sdk/guides/agent-server/overview.mdx +++ b/sdk/guides/agent-server/overview.mdx @@ -148,11 +148,9 @@ Switching from local to remote is just a matter of swapping the workspace class Explore different deployment options: -- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** - Development setup -- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** - Self-hosted production -- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** - Managed hosting -- **[Browser with Docker](/sdk/guides/agent-server/browser-with-docker)** - Web automation -- **[VS Code with Docker](/sdk/guides/agent-server/vscode-with-docker)** - Code editing +- **[Local Agent Server](/sdk/guides/agent-server/local-agent-server)** +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandboxed-server)** +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandboxed-server)** For architectural details: - **[Agent Server Package Architecture](/sdk/arch/agent-server-package)** - Remote execution architecture and deployment diff --git a/sdk/guides/custom-tools.mdx b/sdk/guides/custom-tools.mdx index 12b39b92..8426c10b 100644 --- a/sdk/guides/custom-tools.mdx +++ b/sdk/guides/custom-tools.mdx @@ -17,7 +17,7 @@ tools = get_default_tools() agent = Agent(llm=llm, tools=tools) ``` -See [Tools Overview](/sdk/arch/tools/overview) for the complete list of available tools. +See [Tools Package Architecture](/sdk/arch/tools-package) for the complete list of available tools and design philosophy. ## Understanding the Tool System @@ -27,7 +27,7 @@ The SDK's tool system is built around three core components: 2. **Observation** - Defines output data (what the tool returns) 3. **Executor** - Implements the tool's logic (what the tool does) -These components are tied together by a **ToolDefinition** that registers the tool with the agent. For architectural details and advanced usage patterns, see [Tool System Architecture](/sdk/arch/sdk/tool). +These components are tied together by a **ToolDefinition** that registers the tool with the agent. For architectural details and design principles, see [SDK Package Architecture - Tool System](/sdk/arch/sdk-package#4-tool-system---typed-capabilities). ## Creating a Custom Tool @@ -308,5 +308,6 @@ Create custom tools when you need to: ## Next Steps -- **[Tool System Architecture](/sdk/arch/sdk/tool)** - Deep dive into the tool system +- **[SDK Package Architecture](/sdk/arch/sdk-package)** - Deep dive into the tool system and other SDK components +- **[Tools Package Architecture](/sdk/arch/tools-package)** - Built-in tools design philosophy - **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers diff --git a/sdk/guides/mcp.mdx b/sdk/guides/mcp.mdx index 5d0d91e8..1063dd19 100644 --- a/sdk/guides/mcp.mdx +++ b/sdk/guides/mcp.mdx @@ -242,6 +242,6 @@ mcp_config = { ## Next Steps -- **[MCP Architecture](/sdk/arch/sdk/mcp)** - Technical details and internals +- **[SDK Package Architecture - MCP](/sdk/arch/sdk-package#8-mcp---model-context-protocol)** - Technical details and design decisions - **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools - **[Security Analyzer](/sdk/guides/security)** - Securing tool usage From 383a3e6959073acbad320d98c82e118d010ae75d Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:37:02 -0400 Subject: [PATCH 35/58] improve docs --- docs.json | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/docs.json b/docs.json index 9d330338..9b3f803b 100644 --- a/docs.json +++ b/docs.json @@ -220,15 +220,14 @@ "group": "Remote Agent Server", "pages": [ "sdk/guides/agent-server/overview", - "sdk/guides/agent-server/local-agent-server", - "sdk/guides/agent-server/docker-sandboxed-server", - "sdk/guides/agent-server/api-sandboxed-server", - "sdk/guides/agent-server/browser-with-docker", - "sdk/guides/agent-server/vscode-with-docker", + "sdk/guides/agent-server/local-server", + "sdk/guides/agent-server/docker-sandbox", + "sdk/guides/agent-server/api-sandbox", { "group": "API Reference", "openapi": { - "source": "/openapi/agent-sdk.json" + "source": "/openapi/agent-sdk.json", + "directory": "sdk/guides/agent-server/api-reference" } } ] @@ -243,14 +242,11 @@ { "group": "Architecture", "pages": [ - { - "group": "Language Models", - "pages": [ - "sdk/arch/llms/index", - "sdk/arch/llms/configuration", - "sdk/arch/llms/providers" - ] - } + "sdk/arch/overview", + "sdk/arch/sdk-package", + "sdk/arch/tools-package", + "sdk/arch/workspace-package", + "sdk/arch/agent-server-package" ] } ] From 83c8badcf562d14979f4d8e163ce4f0866c03636 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:39:06 -0400 Subject: [PATCH 36/58] remove arch docs --- sdk/arch/agent_server/overview.mdx | 433 --------------------- sdk/arch/overview.mdx | 142 ------- sdk/arch/sdk/agent.mdx | 301 --------------- sdk/arch/sdk/condenser.mdx | 166 -------- sdk/arch/sdk/conversation.mdx | 487 ------------------------ sdk/arch/sdk/event.mdx | 403 -------------------- sdk/arch/sdk/llm.mdx | 416 -------------------- sdk/arch/sdk/mcp.mdx | 333 ---------------- sdk/arch/sdk/microagents.mdx | 225 ----------- sdk/arch/sdk/security.mdx | 416 -------------------- sdk/arch/sdk/tool.mdx | 199 ---------- sdk/arch/sdk/workspace.mdx | 322 ---------------- sdk/arch/tools/bash.mdx | 288 -------------- sdk/arch/tools/browser_use.mdx | 101 ----- sdk/arch/tools/file_editor.mdx | 338 ---------------- sdk/arch/tools/glob.mdx | 89 ----- sdk/arch/tools/grep.mdx | 140 ------- sdk/arch/tools/overview.mdx | 185 --------- sdk/arch/tools/planning_file_editor.mdx | 128 ------- sdk/arch/tools/task_tracker.mdx | 146 ------- sdk/arch/workspace/docker.mdx | 330 ---------------- sdk/arch/workspace/overview.mdx | 99 ----- sdk/arch/workspace/remote_api.mdx | 325 ---------------- 23 files changed, 6012 deletions(-) delete mode 100644 sdk/arch/agent_server/overview.mdx delete mode 100644 sdk/arch/overview.mdx delete mode 100644 sdk/arch/sdk/agent.mdx delete mode 100644 sdk/arch/sdk/condenser.mdx delete mode 100644 sdk/arch/sdk/conversation.mdx delete mode 100644 sdk/arch/sdk/event.mdx delete mode 100644 sdk/arch/sdk/llm.mdx delete mode 100644 sdk/arch/sdk/mcp.mdx delete mode 100644 sdk/arch/sdk/microagents.mdx delete mode 100644 sdk/arch/sdk/security.mdx delete mode 100644 sdk/arch/sdk/tool.mdx delete mode 100644 sdk/arch/sdk/workspace.mdx delete mode 100644 sdk/arch/tools/bash.mdx delete mode 100644 sdk/arch/tools/browser_use.mdx delete mode 100644 sdk/arch/tools/file_editor.mdx delete mode 100644 sdk/arch/tools/glob.mdx delete mode 100644 sdk/arch/tools/grep.mdx delete mode 100644 sdk/arch/tools/overview.mdx delete mode 100644 sdk/arch/tools/planning_file_editor.mdx delete mode 100644 sdk/arch/tools/task_tracker.mdx delete mode 100644 sdk/arch/workspace/docker.mdx delete mode 100644 sdk/arch/workspace/overview.mdx delete mode 100644 sdk/arch/workspace/remote_api.mdx diff --git a/sdk/arch/agent_server/overview.mdx b/sdk/arch/agent_server/overview.mdx deleted file mode 100644 index 593cb646..00000000 --- a/sdk/arch/agent_server/overview.mdx +++ /dev/null @@ -1,433 +0,0 @@ ---- -title: Agent Server -description: HTTP server for remote agent execution with Docker-based sandboxing and API access. ---- - -The Agent Server provides HTTP API endpoints for remote agent execution. It enables centralized agent management, multi-user support, and production deployments. - -**Source**: [`openhands/agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server) - -## Purpose - -The Agent Server enables: -- **Remote Execution**: Run agents on dedicated servers -- **Multi-User Support**: Isolate execution per user -- **Resource Management**: Centralized resource allocation -- **API Access**: HTTP API for agent operations -- **Production Deployment**: Scalable agent infrastructure - -## Architecture - -```mermaid -graph TD - Client[Client SDK] -->|HTTPS| Server[Agent Server] - Server --> Router[FastAPI Router] - - Router --> Workspace[Workspace API] - Router --> Health[Health Check] - - Workspace --> Docker[Docker Manager] - Docker --> Container1[Container 1] - Docker --> Container2[Container 2] - - style Client fill:#e1f5fe - style Server fill:#fff3e0 - style Router fill:#e8f5e8 - style Docker fill:#f3e5f5 -``` - -## Quick Start - -### Using Pre-built Docker Image - -```bash -# Pull latest image -docker pull ghcr.io/all-hands-ai/agent-server:latest - -# Run server -docker run -d \ - -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest -``` - -### Using Python - -```bash -# Install agent-server package -pip install openhands-agent-server - -# Start server -openhands-agent-server -``` - -## Building Docker Images - -**Source**: [`openhands/agent_server/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/agent_server/docker) - -### Build Script - -```bash -# Build from source -python -m openhands.agent_server.docker.build \ - --base-image ubuntu:22.04 \ - --target runtime \ - --platform linux/amd64 -``` - -### Build Options - -| Option | Description | Default | -|--------|-------------|---------| -| `--base-image` | Base Docker image | `ubuntu:22.04` | -| `--target` | Build target (`runtime` or `dev`) | `runtime` | -| `--platform` | Target platform | Host platform | -| `--output-image` | Output image name | Auto-generated | - -### Programmatic Build - -```python -from openhands.agent_server.docker.build import ( - BuildOptions, - build -) - -# Build custom image -image_name = build( - BuildOptions( - base_image="python:3.12", - target="runtime", - platform="linux/amd64" - ) -) - -print(f"Built image: {image_name}") -``` - -## Docker Images - -### Official Images - -```bash -# Latest release -ghcr.io/all-hands-ai/agent-server:latest - -# Specific version -ghcr.io/all-hands-ai/agent-server:v1.0.0 - -# Development build -ghcr.io/all-hands-ai/agent-server:dev -``` - -### Image Variants - -- **`runtime`**: Production-ready, minimal size -- **`dev`**: Development tools included - -## API Endpoints - -### Health Check - -```bash -GET /api/health -``` - -Returns server health status. - -### Execute Command - -```bash -POST /api/workspace/command -Content-Type: application/json -Authorization: Bearer - -{ - "command": "python script.py", - "working_dir": "/workspace", - "timeout": 30.0 -} -``` - -### File Upload - -```bash -POST /api/workspace/upload -Authorization: Bearer -Content-Type: multipart/form-data - -# Form data with file -``` - -### File Download - -```bash -GET /api/workspace/download?path=/workspace/output.txt -Authorization: Bearer -``` - -## Configuration - -### Environment Variables - -```bash -# Server configuration -export HOST=0.0.0.0 -export PORT=8000 -export API_KEY=your-secret-key - -# Docker configuration -export DOCKER_HOST=unix:///var/run/docker.sock - -# Logging -export LOG_LEVEL=INFO -export DEBUG=false -``` - -### Server Settings - -```python -# config.py -class Settings: - host: str = "0.0.0.0" - port: int = 8000 - api_key: str = "your-secret-key" - workers: int = 4 - timeout: float = 300.0 -``` - -## Deployment - -### Docker Compose - -```yaml -# docker-compose.yml -version: '3.8' - -services: - agent-server: - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - "8000:8000" - volumes: - - /var/run/docker.sock:/var/run/docker.sock - environment: - - API_KEY=your-secret-key - - LOG_LEVEL=INFO - restart: unless-stopped -``` - -### Kubernetes - -```yaml -# deployment.yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: agent-server -spec: - replicas: 3 - selector: - matchLabels: - app: agent-server - template: - metadata: - labels: - app: agent-server - spec: - containers: - - name: agent-server - image: ghcr.io/all-hands-ai/agent-server:latest - ports: - - containerPort: 8000 - env: - - name: API_KEY - valueFrom: - secretKeyRef: - name: agent-server-secrets - key: api-key -``` - -### Systemd Service - -```ini -# /etc/systemd/system/agent-server.service -[Unit] -Description=OpenHands Agent Server -After=docker.service -Requires=docker.service - -[Service] -Type=simple -ExecStart=/usr/bin/docker run \ - --rm \ - -p 8000:8000 \ - -v /var/run/docker.sock:/var/run/docker.sock \ - ghcr.io/all-hands-ai/agent-server:latest - -Restart=always -RestartSec=10 - -[Install] -WantedBy=multi-user.target -``` - -## Security - -### Authentication - -```python -# API key authentication -from fastapi import Header, HTTPException - -async def verify_api_key(authorization: str = Header(None)): - if not authorization or not authorization.startswith("Bearer "): - raise HTTPException(status_code=401) - - api_key = authorization.split(" ")[1] - if api_key != expected_api_key: - raise HTTPException(status_code=403) -``` - -### Container Isolation - -- Each request executes in separate Docker container -- Containers have resource limits -- Network isolation between containers -- Automatic cleanup after execution - -### Rate Limiting - -```python -# Implement rate limiting per API key -from slowapi import Limiter - -limiter = Limiter(key_func=lambda: request.headers.get("Authorization")) - -@app.post("/api/workspace/command") -@limiter.limit("100/minute") -async def execute_command(...): - ... -``` - -## Monitoring - -### Health Checks - -```bash -# Check if server is running -curl http://localhost:8000/api/health - -# Response: -# {"status": "healthy", "version": "1.0.0"} -``` - -### Logging - -```python -# Structured logging -import logging - -logger = logging.getLogger("agent_server") -logger.info("Request received", extra={ - "user_id": user_id, - "command": command, - "duration": duration -}) -``` - -### Metrics - -Track important metrics: -- Request rate and latency -- Container creation/cleanup time -- Resource usage per container -- Error rates and types - -## Troubleshooting - -### Server Won't Start - -```bash -# Check port availability -netstat -tuln | grep 8000 - -# Check Docker socket -docker ps - -# Check logs -docker logs agent-server -``` - -### Container Creation Fails - -```bash -# Verify Docker permissions -docker run hello-world - -# Check Docker socket mount -ls -la /var/run/docker.sock - -# Check available resources -docker stats -``` - -### Performance Issues - -```bash -# Check resource usage -docker stats - -# Increase worker count -export WORKERS=8 - -# Optimize container startup -# Use pre-built images -# Reduce image size -``` - -## Best Practices - -1. **Use Pre-built Images**: Faster startup, consistent environment -2. **Set Resource Limits**: Prevent resource exhaustion -3. **Enable Monitoring**: Track performance and errors -4. **Implement Rate Limiting**: Prevent abuse -5. **Secure API Keys**: Use strong, rotated keys -6. **Use HTTPS**: Encrypt data in transit -7. **Regular Updates**: Keep images updated -8. **Backup Configuration**: Version control configurations - -## Development - -### Running Locally - -```bash -# Clone repository -git clone https://github.com/All-Hands-AI/agent-sdk.git -cd agent-sdk - -# Install dependencies -pip install -e ".[server]" - -# Run development server -uvicorn openhands.agent_server.main:app --reload -``` - -### Testing - -```bash -# Run tests -pytest openhands/agent_server/tests/ - -# Test specific endpoint -curl -X POST http://localhost:8000/api/workspace/command \ - -H "Authorization: Bearer test-key" \ - -H "Content-Type: application/json" \ - -d '{"command": "echo test", "working_dir": "/workspace"}' -``` - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based local execution -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Client for agent server -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Server usage examples -- **[FastAPI Documentation](https://fastapi.tiangolo.com/)** - Web framework used diff --git a/sdk/arch/overview.mdx b/sdk/arch/overview.mdx deleted file mode 100644 index 6662ba9b..00000000 --- a/sdk/arch/overview.mdx +++ /dev/null @@ -1,142 +0,0 @@ ---- -title: Overview -description: A modular framework for building AI agents, organized into four packages for clarity and extensibility. ---- - -The OpenHands Agent SDK is organized into four packages, each serving a distinct purpose in the agent development lifecycle. - -## Package Structure - -```mermaid -graph TD - SDK[SDK Package
Core Framework] --> Tools[Tools Package
Built-in Tools] - SDK --> Workspace[Workspace Package
Execution Environments] - SDK --> AgentServer[Agent Server Package
Remote Execution] - - Tools -.->|Used by| SDK - Workspace -.->|Used by| SDK - AgentServer -.->|Hosts| SDK - - style SDK fill:#e1f5fe - style Tools fill:#e8f5e8 - style Workspace fill:#fff3e0 - style AgentServer fill:#f3e5f5 -``` - -## 1. SDK Package - -Core framework for building agents locally. - -**Key Components:** -- **[Tool System](/sdk/architecture/sdk/tool)** - Define custom capabilities -- **[Microagents](/sdk/architecture/sdk/microagents)** - Specialized behavior modules -- **[Condenser](/sdk/architecture/sdk/condenser)** - Memory management -- **[Agent](/sdk/architecture/sdk/agent)** - Base agent interface -- **[Workspace](/sdk/architecture/sdk/workspace)** - Execution abstraction -- **[Conversation](/sdk/architecture/sdk/conversation)** - Lifecycle management -- **[Event](/sdk/architecture/sdk/event)** - Event system -- **[LLM](/sdk/architecture/sdk/llm)** - Language model integration -- **[MCP](/sdk/architecture/sdk/mcp)** - Model Context Protocol -- **[Security](/sdk/architecture/sdk/security)** - Security framework - -## 2. Tools Package - -Production-ready tool implementations. - -**Available Tools:** -- **[BashTool](/sdk/architecture/tools/bash)** - Command execution -- **[FileEditorTool](/sdk/architecture/tools/file_editor)** - File manipulation -- **[GlobTool](/sdk/architecture/tools/glob)** - File discovery -- **[GrepTool](/sdk/architecture/tools/grep)** - Content search -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker)** - Task management -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor)** - Multi-file workflows -- **[BrowserUseTool](/sdk/architecture/tools/browser_use)** - Web interaction - -## 3. Workspace Package - -Advanced execution environments for production. - -**Workspace Types:** -- **[DockerWorkspace](/sdk/architecture/workspace/docker)** - Container-based isolation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api)** - Remote server execution - -See [Workspace Overview](/sdk/architecture/workspace/overview) for comparison. - -## 4. Agent Server Package - -HTTP server for centralized agent execution. - -**Capabilities:** -- Remote agent execution via API -- Multi-user isolation -- Container management -- Resource allocation - -See [Agent Server Documentation](/sdk/architecture/agent_server/overview). - -## Component Interaction - -```mermaid -graph LR - User[User] -->|Message| Conv[Conversation] - Conv -->|Manages| Agent[Agent] - - Agent -->|Reasons with| LLM[LLM] - Agent -->|Executes| Tools[Tools] - Agent -->|Guided by| Micro[Microagents] - - Tools -->|Run in| Workspace[Workspace] - - style User fill:#e1f5fe - style Conv fill:#fff3e0 - style Agent fill:#f3e5f5 - style LLM fill:#e8f5e8 - style Tools fill:#fce4ec - style Workspace fill:#e0f2f1 -``` - -## Design Principles - -### Immutability & Serialization -All core classes are: -- **Immutable**: State changes create new instances -- **Serializable**: Full conversation state can be saved/restored -- **Type-safe**: Pydantic models ensure data integrity - -### Modularity -- **Composable**: Mix and match components as needed -- **Extensible**: Add custom tools, LLMs, or workspaces -- **Testable**: Each component can be tested in isolation - -### Backward Compatibility -- **Semantic versioning** indicates compatibility levels -- **Migration guides** provided for major changes - -## Getting Started - -New to the SDK? Start with the guides: - -- **[Getting Started](/sdk/guides/getting-started)** - Quick introduction -- **[Streaming Mode](/sdk/guides/streaming-mode)** - Execution patterns -- **[Tools & MCP](/sdk/guides/tools-and-mcp)** - Extending capabilities -- **[Workspaces](/sdk/guides/workspaces)** - Execution environments -- **[Sub-agents](/sdk/guides/subagents)** - Agent delegation - -## Deep Dive - -Explore individual components: - -- **SDK Package** - [Tool](/sdk/architecture/sdk/tool) | [Agent](/sdk/architecture/sdk/agent) | [LLM](/sdk/architecture/sdk/llm) | [Conversation](/sdk/architecture/sdk/conversation) -- **Tools Package** - [BashTool](/sdk/architecture/tools/bash) | [FileEditorTool](/sdk/architecture/tools/file_editor) -- **Workspace Package** - [DockerWorkspace](/sdk/architecture/workspace/docker) | [RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api) -- **Agent Server** - [Overview](/sdk/architecture/agent_server/overview) - -## Examples - -Browse the [`examples/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples) directory for practical implementations: - -- **Hello World** - Basic agent usage -- **Custom Tools** - Creating new capabilities -- **Docker Workspace** - Sandboxed execution -- **MCP Integration** - External tool servers -- **Planning Agent** - Multi-step workflows diff --git a/sdk/arch/sdk/agent.mdx b/sdk/arch/sdk/agent.mdx deleted file mode 100644 index 3c0da066..00000000 --- a/sdk/arch/sdk/agent.mdx +++ /dev/null @@ -1,301 +0,0 @@ ---- -title: Agent -description: Core orchestrator combining language models with tools to execute tasks through structured reasoning loops. ---- - -The Agent orchestrates LLM reasoning with tool execution to solve tasks. It manages the reasoning loop, system prompts, and state transitions while maintaining conversation context. - -**Source**: [`openhands/sdk/agent/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent) - -## Core Concepts - -```mermaid -graph TD - Agent[Agent] --> LLM[LLM] - Agent --> Tools[Tools] - Agent --> Context[AgentContext] - Agent --> Condenser[Condenser] - - Context --> Microagents[Microagents] - Tools --> Bash[BashTool] - Tools --> FileEditor[FileEditorTool] - Tools --> MCP[MCP Tools] - - style Agent fill:#e1f5fe - style LLM fill:#fff3e0 - style Tools fill:#e8f5e8 - style Context fill:#f3e5f5 -``` - -An agent combines: -- **LLM**: Language model for reasoning and decision-making -- **Tools**: Capabilities to interact with the environment -- **Context**: Additional knowledge and specialized expertise -- **Condenser**: Memory management for long conversations - -## Base Interface - -**Source**: [`openhands/sdk/agent/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/base.py) - -### AgentBase - -Abstract base class defining the agent interface: - -```python -from openhands.sdk.agent import AgentBase -from openhands.sdk.conversation import ConversationState - -class CustomAgent(AgentBase): - def step(self, state: ConversationState) -> ConversationState: - """Execute one reasoning step and return updated state.""" - # Your agent logic here - return updated_state -``` - -**Key Properties**: -- **Immutable**: Agents are frozen Pydantic models -- **Serializable**: Full agent configuration can be saved/restored -- **Type-safe**: Strict type checking with Pydantic validation - -## Agent Implementation - -**Source**: [`openhands/sdk/agent/agent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/agent.py) - -### Initialization Arguments - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[ - BashTool.create(), - FileEditorTool.create() - ], - mcp_config={}, # Optional MCP configuration - filter_tools_regex=None, # Optional regex to filter tools - agent_context=None, # Optional context with microagents - condenser=None, # Optional context condenser - security_analyzer=None, # Optional security analyzer - confirmation_policy=None, # Optional confirmation policy -) -``` - -### Key Parameters - -| Parameter | Type | Description | -|-----------|------|-------------| -| `llm` | `LLM` | Language model configuration (required) | -| `tools` | `list[Tool]` | Tools available to the agent | -| `mcp_config` | `dict` | MCP server configuration for external tools | -| `filter_tools_regex` | `str` | Regex to filter available tools | -| `agent_context` | `AgentContext` | Additional context and microagents | -| `condenser` | `CondenserBase` | Context condensation strategy | -| `security_analyzer` | `SecurityAnalyzer` | Security risk analysis | -| `confirmation_policy` | `ConfirmationPolicy` | Action confirmation strategy | - -## Agent Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant LLM - participant Tools - - User->>Conversation: Start conversation - Conversation->>Agent: Initialize state - loop Until task complete - Conversation->>Agent: step(state) - Agent->>LLM: Generate response - LLM->>Agent: Tool calls + reasoning - Agent->>Tools: Execute actions - Tools->>Agent: Observations - Agent->>Conversation: Updated state - end - Conversation->>User: Final result -``` - -### Execution Flow - -1. **Initialization**: Create agent with LLM and tools -2. **State Setup**: Pass agent to conversation -3. **Reasoning Loop**: Conversation calls `agent.step(state)` repeatedly -4. **Tool Execution**: Agent executes tool calls from LLM -5. **State Updates**: Agent returns updated conversation state -6. **Termination**: Loop ends when agent calls `FinishTool` - -## Usage Examples - -### Basic Agent - -See [`examples/01_standalone_sdk/01_hello_world.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py): - -```python -from openhands.sdk import Agent, LLM, Conversation -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create LLM -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -# Create agent -agent = Agent( - llm=llm, - tools=[ - BashTool.create(), - FileEditorTool.create() - ] -) - -# Use with conversation -conversation = Conversation(agent=agent) -await conversation.run(user_message="Your task here") -``` - -### Agent with Context - -See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py): - -```python -from openhands.sdk import Agent, AgentContext - -# Create context with microagents -context = AgentContext( - microagents=["testing_expert", "code_reviewer"] -) - -agent = Agent( - llm=llm, - tools=tools, - agent_context=context -) -``` - -### Agent with Memory Management - -See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): - -```python -from openhands.sdk.context import LLMCondenser - -condenser = LLMCondenser( - max_tokens=8000, - target_tokens=6000 -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -### Agent with MCP Tools - -See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py): - -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} - -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config -) -``` - -### Planning Agent Workflow - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) for a complete example of multi-phase agent workflows. - -## System Prompts - -**Source**: [`openhands/sdk/agent/prompts/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/agent/prompts) - -Agents use Jinja2 templates for system prompts. Available templates: - -| Template | Use Case | Source | -|----------|----------|--------| -| `system_prompt.j2` | Default reasoning and tool usage | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt.j2) | -| `system_prompt_interactive.j2` | Interactive conversations | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_interactive.j2) | -| `system_prompt_long_horizon.j2` | Complex multi-step tasks | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_long_horizon.j2) | -| `system_prompt_planning.j2` | Planning-focused workflows | [View](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/agent/prompts/system_prompt_planning.j2) | - -### Custom Prompts - -Create custom agent classes with specialized prompts: - -```python -class PlanningAgent(Agent): - system_prompt_filename: str = "system_prompt_planning.j2" -``` - -## Custom Agent Development - -### Extending AgentBase - -```python -from openhands.sdk.agent import AgentBase -from openhands.sdk.conversation import ConversationState - -class SpecializedAgent(AgentBase): - # Custom configuration - max_iterations: int = 10 - - def step(self, state: ConversationState) -> ConversationState: - # Custom reasoning logic - # Tool selection and execution - # State management - return updated_state -``` - -### Multi-Agent Composition - -```python -class WorkflowAgent(AgentBase): - planning_agent: Agent - execution_agent: Agent - - def step(self, state: ConversationState) -> ConversationState: - # Phase 1: Planning - plan = self.planning_agent.step(state) - - # Phase 2: Execution - result = self.execution_agent.step(plan) - - return result -``` - -## Best Practices - -1. **Tool Selection**: Provide only necessary tools to reduce complexity -2. **Clear Instructions**: Use detailed user messages for better task understanding -3. **Context Management**: Use condensers for long-running conversations -4. **Error Handling**: Implement proper error recovery strategies -5. **Security**: Use confirmation policies for sensitive operations -6. **Testing**: Test agents with various scenarios and edge cases - -## See Also - -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Defining and using tools -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing agent conversations -- **[LLM](/sdk/architecture/sdk/llm.mdx)** - Language model configuration -- **[MCP](/sdk/architecture/sdk/mcp.mdx)** - External tool integration -- **[Security](/sdk/architecture/sdk/security.mdx)** - Security and confirmation policies diff --git a/sdk/arch/sdk/condenser.mdx b/sdk/arch/sdk/condenser.mdx deleted file mode 100644 index 59d59da6..00000000 --- a/sdk/arch/sdk/condenser.mdx +++ /dev/null @@ -1,166 +0,0 @@ ---- -title: Context Condenser -description: Manage agent memory by intelligently compressing conversation history when approaching token limits. ---- - -The context condenser manages agent memory by intelligently compressing conversation history when approaching token limits. This enables agents to maintain coherent context in long-running conversations without exceeding LLM context windows. - -**Source**: [`openhands/sdk/context/condenser/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/condenser) - -## Why Context Condensation? - -```mermaid -graph LR - A[Long Conversation] --> B{Token Limit?} - B -->|Approaching| C[Condense] - B -->|Within Limit| D[Continue] - C --> E[Compressed Context] - E --> F[Agent with Memory] - D --> F - - style A fill:#e1f5fe - style C fill:#fff3e0 - style E fill:#e8f5e8 - style F fill:#f3e5f5 -``` - -As conversations grow, they may exceed LLM context windows. Condensers solve this by: -- Summarizing older messages while preserving key information -- Maintaining recent context in full detail -- Reducing token count without losing conversation coherence - -## LLM Condenser (Default) - -**Source**: [`openhands/sdk/context/condenser/llm_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/condenser/llm_condenser.py) - -The default condenser uses an LLM to intelligently summarize conversation history. - -### How It Works - -1. **Monitor Token Count**: Tracks conversation token usage -2. **Trigger Condensation**: Activates when approaching token threshold -3. **Summarize History**: Uses LLM to compress older messages -4. **Preserve Recent**: Keeps recent messages uncompressed -5. **Update Context**: Replaces verbose history with summary - -### Configuration - -```python -from openhands.sdk.context import LLMCondenser - -condenser = LLMCondenser( - max_tokens=8000, # Trigger condensation at this limit - target_tokens=6000, # Reduce to this token count - preserve_recent=10 # Keep last N messages uncompressed -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -### Example Usage - -See [`examples/01_standalone_sdk/14_context_condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py): - -```python -from openhands.sdk import Agent, LLM -from openhands.sdk.context import LLMCondenser -from pydantic import SecretStr - -# Configure condenser -condenser = LLMCondenser( - max_tokens=8000, - target_tokens=6000 -) - -# Create agent with condenser -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -agent = Agent( - llm=llm, - tools=tools, - condenser=condenser -) -``` - -## Condensation Strategy - -### Multi-Phase Approach - -```mermaid -sequenceDiagram - participant Agent - participant Condenser - participant LLM - - Agent->>Condenser: Check token count - Condenser->>Condenser: Exceeds threshold? - Condenser->>LLM: Summarize old messages - LLM->>Condenser: Summary - Condenser->>Agent: Updated context - Agent->>Agent: Continue with condensed history -``` - -### What Gets Condensed - -- **System messages**: Preserved as-is -- **Recent messages**: Kept in full (configurable count) -- **Older messages**: Summarized into compact form -- **Tool results**: Preserved for reference -- **User preferences**: Maintained across condensation - -## Custom Condensers - -Implement custom condensation strategies by extending the base class: - -```python -from openhands.sdk.context import CondenserBase -from openhands.sdk.event import ConversationState - -class CustomCondenser(CondenserBase): - def condense(self, state: ConversationState) -> ConversationState: - """Implement custom condensation logic.""" - # Your condensation algorithm - return condensed_state - - def should_condense(self, state: ConversationState) -> bool: - """Determine when to trigger condensation.""" - # Your trigger logic - return token_count > threshold -``` - -## Best Practices - -1. **Set Appropriate Thresholds**: Leave buffer room below actual limit -2. **Preserve Recent Context**: Keep enough messages for coherent flow -3. **Monitor Performance**: Track condensation frequency and effectiveness -4. **Test Condensation**: Verify important information isn't lost -5. **Adjust Per Use Case**: Different tasks need different settings - -## Configuration Guidelines - -| Use Case | max_tokens | target_tokens | preserve_recent | -|----------|-----------|---------------|-----------------| -| Short tasks | 4000 | 3000 | 5 | -| Medium conversations | 8000 | 6000 | 10 | -| Long-running agents | 16000 | 12000 | 20 | -| Code-heavy tasks | 12000 | 10000 | 15 | - -## Performance Considerations - -- **Condensation Cost**: Uses additional LLM calls -- **Latency**: Brief pause during condensation -- **Context Quality**: Trade-off between compression and information retention -- **Frequency**: Tune thresholds to minimize condensation events - -## See Also - -- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using condensers with agents -- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py)** - Working example -- **[Conversation State](/sdk/architecture/sdk/conversation.mdx)** - Managing conversation state diff --git a/sdk/arch/sdk/conversation.mdx b/sdk/arch/sdk/conversation.mdx deleted file mode 100644 index e702fb36..00000000 --- a/sdk/arch/sdk/conversation.mdx +++ /dev/null @@ -1,487 +0,0 @@ ---- -title: Conversation -description: Manage agent lifecycles through structured message flows and state persistence. ---- - -The Conversation class orchestrates agent execution through structured message flows. It manages the agent lifecycle, state persistence, and provides APIs for interaction and monitoring. - -**Source**: [`openhands/sdk/conversation/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/conversation) - -## Core Concepts - -```mermaid -graph LR - User[User] --> Conversation[Conversation] - Conversation --> Agent[Agent] - Conversation --> State[ConversationState] - Conversation --> Events[Event History] - - Agent --> Step[step()] - State --> Persistence[Persistence] - - style Conversation fill:#e1f5fe - style Agent fill:#f3e5f5 - style State fill:#fff3e0 - style Events fill:#e8f5e8 -``` - -A conversation: -- **Manages Agent Lifecycle**: Initializes and runs agents until completion -- **Handles State**: Maintains conversation history and context -- **Enables Interaction**: Send messages and receive responses -- **Provides Persistence**: Save and restore conversation state -- **Monitors Progress**: Track execution stats and events - -## Basic API - -**Source**: [`openhands/sdk/conversation/conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation.py) - -### Creating a Conversation - -```python -from openhands.sdk import Conversation, Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create agent -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Create conversation -conversation = Conversation( - agent=agent, - workspace="workspace/project", # Working directory - persistence_dir="conversations", # Save conversation state - max_iteration_per_run=500, # Max steps per run - stuck_detection=True, # Detect infinite loops - visualize=True # Generate execution visualizations -) -``` - -### Constructor Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `agent` | `AgentBase` | *Required* | Agent to run in the conversation | -| `workspace` | `str \| LocalWorkspace \| RemoteWorkspace` | `"workspace/project"` | Execution environment | -| `persistence_dir` | `str \| None` | `None` | Directory for saving state | -| `conversation_id` | `ConversationID \| None` | `None` | Resume existing conversation | -| `callbacks` | `list[ConversationCallbackType] \| None` | `None` | Event callbacks | -| `max_iteration_per_run` | `int` | `500` | Maximum steps per `run()` call | -| `stuck_detection` | `bool` | `True` | Enable stuck detection | -| `visualize` | `bool` | `True` | Generate visualizations | -| `secrets` | `dict \| None` | `None` | Secret values for agent | - -## Agent Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant State - - User->>Conversation: Create conversation(agent) - Conversation->>State: Initialize state - Conversation->>Agent: init_state() - - User->>Conversation: send_message("Task") - Conversation->>State: Add message event - - User->>Conversation: run() - loop Until agent finishes or max iterations - Conversation->>Agent: step(state) - Agent->>State: Update with actions/observations - Conversation->>User: Callback with events - end - - User->>Conversation: agent_final_response() - Conversation->>User: Return final result -``` - -### 1. Create Agent - -Define agent with LLM and tools: - -```python -agent = Agent(llm=llm, tools=tools) -``` - -### 2. Create Conversation - -Pass agent to conversation: - -```python -conversation = Conversation(agent=agent) -``` - -### 3. Send Messages - -Add user messages to conversation: - -```python -conversation.send_message("Build a web scraper for news articles") -``` - -### 4. Run Agent - -Execute agent until task completion: - -```python -conversation.run() -``` - -The conversation will call `agent.step(state)` repeatedly until: -- Agent calls `FinishTool` -- Maximum iterations reached -- Agent encounters an error -- User pauses execution - -### 5. Get Results - -Retrieve agent's final response: - -```python -result = conversation.agent_final_response() -print(result) -``` - -## Core Methods - -**Source**: [`openhands/sdk/conversation/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/base.py) - -### send_message() - -Add a message to the conversation: - -```python -# String message -conversation.send_message("Write unit tests for the API") - -# Message object with images -from openhands.sdk.llm import Message, ImageContent - -message = Message( - role="user", - content=[ - "What's in this image?", - ImageContent(source="path/to/image.png") - ] -) -conversation.send_message(message) -``` - -See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). - -### run() - -Execute the agent until completion or max iterations: - -```python -# Synchronous execution -conversation.run() - -# Async execution -await conversation.run() -``` - -See [`examples/01_standalone_sdk/11_async.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) for async usage. - -### agent_final_response() - -Get the agent's final response: - -```python -final_response = conversation.agent_final_response() -``` - -### pause() - -Pause agent execution: - -```python -conversation.pause() -``` - -See [`examples/01_standalone_sdk/09_pause_example.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py). - -### close() - -Clean up resources: - -```python -conversation.close() -``` - -## Conversation State - -**Source**: [`openhands/sdk/conversation/state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/state.py) - -### Accessing State - -```python -state = conversation.state - -# Conversation properties -print(state.id) # Unique conversation ID -print(state.agent_status) # Current execution status -print(state.events) # Event history - -# Agent and workspace -print(state.agent) # The agent instance -print(state.workspace) # The workspace -``` - -### Agent Execution Status - -```python -from openhands.sdk.conversation.state import AgentExecutionStatus - -status = state.agent_status - -# Possible values: -# - AgentExecutionStatus.IDLE -# - AgentExecutionStatus.RUNNING -# - AgentExecutionStatus.FINISHED -# - AgentExecutionStatus.ERROR -# - AgentExecutionStatus.PAUSED -``` - -## Persistence - -### Saving Conversations - -Conversations are automatically persisted when `persistence_dir` is set: - -```python -conversation = Conversation( - agent=agent, - persistence_dir="conversations" # Saves to conversations// -) -``` - -See [`examples/01_standalone_sdk/10_persistence.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py). - -### Resuming Conversations - -Resume from a saved conversation ID: - -```python -from openhands.sdk.conversation.types import ConversationID - -# Get conversation ID -conv_id = conversation.id - -# Later, resume with the same ID -resumed_conversation = Conversation( - agent=agent, - conversation_id=conv_id, - persistence_dir="conversations" -) -``` - -## Monitoring and Stats - -**Source**: [`openhands/sdk/conversation/conversation_stats.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/conversation_stats.py) - -### Conversation Stats - -```python -stats = conversation.conversation_stats - -print(stats.total_messages) # Total messages exchanged -print(stats.total_tokens) # Total tokens used -print(stats.total_cost) # Estimated cost -print(stats.duration) # Execution time -``` - -See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). - -## Event Callbacks - -### Registering Callbacks - -Monitor conversation events in real-time: - -```python -from openhands.sdk.conversation import ConversationCallbackType -from openhands.sdk.event import Event - -def on_event(event: Event): - if isinstance(event, MessageEvent): - print(f"Message: {event.content}") - elif isinstance(event, ActionEvent): - print(f"Action: {event.action.kind}") - elif isinstance(event, ObservationEvent): - print(f"Observation: {event.observation.kind}") - -conversation = Conversation( - agent=agent, - callbacks=[on_event] -) -``` - -## Advanced Features - -### Stuck Detection - -**Source**: [`openhands/sdk/conversation/stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/stuck_detector.py) - -Automatically detects when agents are stuck in loops: - -```python -conversation = Conversation( - agent=agent, - stuck_detection=True # Default: True -) -``` - -See [`examples/01_standalone_sdk/20_stuck_detector.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py). - -### Secrets Management - -**Source**: [`openhands/sdk/conversation/secrets_manager.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/secrets_manager.py) - -Provide secrets for agent operations: - -```python -conversation = Conversation( - agent=agent, - secrets={ - "API_KEY": "secret-value", - "DATABASE_URL": "postgres://..." - } -) - -# Update secrets during execution -conversation.update_secrets({ - "NEW_TOKEN": "new-value" -}) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -### Visualization - -**Source**: [`openhands/sdk/conversation/visualizer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/visualizer.py) - -Generate execution visualizations: - -```python -conversation = Conversation( - agent=agent, - visualize=True # Default: True -) - -# Visualizations saved to workspace/visualizations/ -``` - -### Title Generation - -Generate conversation titles: - -```python -title = conversation.generate_title(max_length=50) -print(f"Conversation: {title}") -``` - -## Local vs Remote Conversations - -### LocalConversation - -**Source**: [`openhands/sdk/conversation/impl/local_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/local_conversation.py) - -Runs agent locally: - -```python -from openhands.sdk.workspace import LocalWorkspace - -conversation = Conversation( - agent=agent, - workspace=LocalWorkspace(working_dir="/project") -) -``` - -### RemoteConversation - -**Source**: [`openhands/sdk/conversation/impl/remote_conversation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/conversation/impl/remote_conversation.py) - -Runs agent on remote server: - -```python -from openhands.workspace import RemoteAPIWorkspace - -conversation = Conversation( - agent=agent, - workspace=RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com" - ) -) -``` - -## Best Practices - -1. **Set Appropriate Iteration Limits**: Prevent runaway executions -2. **Use Persistence**: Save important conversations for resume/replay -3. **Monitor Events**: Use callbacks for real-time monitoring -4. **Handle Errors**: Check agent status and handle failures gracefully -5. **Clean Up Resources**: Call `close()` when done -6. **Enable Stuck Detection**: Catch infinite loops early -7. **Track Stats**: Monitor token usage and costs - -## Complete Example - -```python -from openhands.sdk import Conversation, Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -# Create agent -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Create conversation -conversation = Conversation( - agent=agent, - workspace="workspace/project", - persistence_dir="conversations", - max_iteration_per_run=100 -) - -try: - # Send task - conversation.send_message("Create a simple REST API") - - # Run agent - conversation.run() - - # Get result - result = conversation.agent_final_response() - print(f"Result: {result}") - - # Check stats - stats = conversation.conversation_stats - print(f"Tokens used: {stats.total_tokens}") - print(f"Cost: ${stats.total_cost}") -finally: - # Clean up - conversation.close() -``` - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration and usage -- **[Events](/sdk/architecture/sdk/event.mdx)** - Event types and handling -- **[Workspace](/sdk/architecture/sdk/workspace.mdx)** - Workspace configuration -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - Usage examples diff --git a/sdk/arch/sdk/event.mdx b/sdk/arch/sdk/event.mdx deleted file mode 100644 index a286dab0..00000000 --- a/sdk/arch/sdk/event.mdx +++ /dev/null @@ -1,403 +0,0 @@ ---- -title: Event System -description: Structured event types representing agent actions, observations, and system messages in conversations. ---- - -The event system provides structured representations of all interactions in agent conversations. Events enable state management, LLM communication, and real-time monitoring. - -**Source**: [`openhands/sdk/event/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/event) - -## Core Concepts - -```mermaid -graph TD - Event[Event] --> LLMConvertible[LLMConvertibleEvent] - Event --> NonConvertible[Non-LLM Events] - - LLMConvertible --> Action[ActionEvent] - LLMConvertible --> Observation[ObservationEvent] - LLMConvertible --> Message[MessageEvent] - LLMConvertible --> System[SystemPromptEvent] - - NonConvertible --> State[StateUpdateEvent] - NonConvertible --> User[UserActionEvent] - NonConvertible --> Condenser[CondenserEvent] - - style Event fill:#e1f5fe - style LLMConvertible fill:#fff3e0 - style NonConvertible fill:#e8f5e8 -``` - -Events fall into two categories: -- **LLMConvertibleEvent**: Events that become LLM messages -- **Non-LLM Events**: Internal state and control events - -## Base Event Classes - -**Source**: [`openhands/sdk/event/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/base.py) - -### Event - -Base class for all events: - -```python -from openhands.sdk.event import Event - -class Event: - id: str # Unique event identifier - timestamp: str # ISO format timestamp - source: SourceType # Event source (agent/user/system) -``` - -**Properties**: -- **Immutable**: Events are frozen Pydantic models -- **Serializable**: Full event data can be saved/restored -- **Visualizable**: Rich text representation for display - -### LLMConvertibleEvent - -Events that can be converted to LLM messages: - -```python -from openhands.sdk.event import LLMConvertibleEvent -from openhands.sdk.llm import Message - -class LLMConvertibleEvent(Event): - def to_llm_message(self) -> Message: - """Convert event to LLM message format.""" - ... -``` - -These events form the conversation history sent to the LLM. - -## LLM-Convertible Events - -### ActionEvent - -**Source**: [`openhands/sdk/event/llm_convertible/action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/action.py) - -Represents actions taken by the agent: - -```python -from openhands.sdk.event import ActionEvent -from openhands.sdk.tool import Action - -class ActionEvent(LLMConvertibleEvent): - action: Action # The action being executed - thought: str # Agent's reasoning (optional) -``` - -**Purpose**: Records what the agent decided to do. - -**Example**: -```python -from openhands.tools import BashAction - -action_event = ActionEvent( - source="agent", - action=BashAction(command="ls -la"), - thought="List files to understand directory structure" -) -``` - -### ObservationEvent - -**Source**: [`openhands/sdk/event/llm_convertible/observation.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/observation.py) - -Represents observations from tool execution: - -```python -from openhands.sdk.event import ObservationEvent -from openhands.sdk.tool import Observation - -class ObservationEvent(LLMConvertibleEvent): - observation: Observation # Tool execution result -``` - -**Purpose**: Records the outcome of agent actions. - -**Example**: -```python -from openhands.tools import BashObservation - -observation_event = ObservationEvent( - source="tool", - observation=BashObservation( - output="file1.txt\nfile2.py\n", - exit_code=0 - ) -) -``` - -**Related Events**: -- **AgentErrorEvent**: Agent execution errors -- **UserRejectObservation**: User rejected an action - -### MessageEvent - -**Source**: [`openhands/sdk/event/llm_convertible/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/message.py) - -Represents messages in the conversation: - -```python -from openhands.sdk.event import MessageEvent - -class MessageEvent(LLMConvertibleEvent): - content: str | list # Message content (text or multimodal) - role: str # Role: "user", "assistant", "system" - images_urls: list[str] # Optional image URLs -``` - -**Purpose**: User messages, agent responses, and system messages. - -**Example**: -```python -message_event = MessageEvent( - source="user", - content="Create a web scraper", - role="user" -) -``` - -### SystemPromptEvent - -**Source**: [`openhands/sdk/event/llm_convertible/system.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/llm_convertible/system.py) - -Represents system prompts: - -```python -from openhands.sdk.event import SystemPromptEvent - -class SystemPromptEvent(LLMConvertibleEvent): - content: str # System prompt content -``` - -**Purpose**: Provides instructions and context to the agent. - -## Non-LLM Events - -### ConversationStateUpdateEvent - -**Source**: [`openhands/sdk/event/conversation_state.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/conversation_state.py) - -Tracks conversation state changes: - -```python -from openhands.sdk.event import ConversationStateUpdateEvent - -class ConversationStateUpdateEvent(Event): - # Internal state update event - # Not sent to LLM -``` - -**Purpose**: Internal tracking of conversation state transitions. - -### PauseEvent - -**Source**: [`openhands/sdk/event/user_action.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/user_action.py) - -User paused the conversation: - -```python -from openhands.sdk.event import PauseEvent - -class PauseEvent(Event): - pass -``` - -**Purpose**: Signal that user has paused agent execution. - -### Condenser Events - -**Source**: [`openhands/sdk/event/condenser.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/condenser.py) - -Track context condensation: - -#### Condensation - -```python -class Condensation(Event): - content: str # Condensed summary -``` - -**Purpose**: Record the condensed conversation history. - -#### CondensationRequest - -```python -class CondensationRequest(Event): - pass -``` - -**Purpose**: Request context condensation. - -#### CondensationSummaryEvent - -```python -class CondensationSummaryEvent(LLMConvertibleEvent): - content: str # Summary for LLM -``` - -**Purpose**: Provide condensed context to LLM. - -## Event Source Types - -**Source**: [`openhands/sdk/event/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/event/types.py) - -```python -SourceType = Literal["agent", "user", "tool", "system"] -``` - -- **agent**: Events from the agent -- **user**: Events from the user -- **tool**: Events from tool execution -- **system**: System-generated events - -## Event Streams - -### Converting to LLM Messages - -Events are converted to LLM messages for context: - -```python -from openhands.sdk.event import LLMConvertibleEvent - -events = [action_event, observation_event, message_event] -messages = LLMConvertibleEvent.events_to_messages(events) - -# Send to LLM -response = llm.completion(messages=messages) -``` - -### Event Batching - -Multiple actions in a single step are batched: - -```python -# Multi-action events -action1 = ActionEvent(action=BashAction(...)) -action2 = ActionEvent(action=FileEditAction(...)) - -# Converted to single LLM message with multiple tool calls -messages = LLMConvertibleEvent.events_to_messages([action1, action2]) -``` - -## Event Visualization - -Events support rich text visualization: - -```python -from openhands.sdk.event import Event - -event = MessageEvent( - source="user", - content="Hello", - role="user" -) - -# Rich text representation -print(event.visualize) - -# Plain text -print(str(event)) -# Output: MessageEvent (user) -# user: Hello -``` - -## Event Callbacks - -Monitor events in real-time: - -```python -from openhands.sdk import Conversation -from openhands.sdk.event import ( - Event, - ActionEvent, - ObservationEvent, - MessageEvent -) - -def on_event(event: Event): - if isinstance(event, MessageEvent): - print(f"šŸ’¬ Message: {event.content}") - elif isinstance(event, ActionEvent): - print(f"šŸ”§ Action: {event.action.kind}") - elif isinstance(event, ObservationEvent): - print(f"šŸ‘ļø Observation: {event.observation.content}") - -conversation = Conversation( - agent=agent, - callbacks=[on_event] -) -``` - -## Event History - -Access conversation event history: - -```python -conversation = Conversation(agent=agent) -conversation.send_message("Task") -conversation.run() - -# Get all events -events = conversation.state.events - -# Filter by type -actions = [e for e in events if isinstance(e, ActionEvent)] -observations = [e for e in events if isinstance(e, ObservationEvent)] -messages = [e for e in events if isinstance(e, MessageEvent)] -``` - -## Serialization - -Events are fully serializable: - -```python -# Serialize event -event_json = event.model_dump_json() - -# Deserialize -from openhands.sdk.event import Event -restored_event = Event.model_validate_json(event_json) -``` - -## Best Practices - -1. **Use Type Guards**: Check event types with `isinstance()` -2. **Handle All Types**: Cover all event types in callbacks -3. **Preserve Immutability**: Never mutate event objects -4. **Log Events**: Keep event history for debugging -5. **Filter Strategically**: Process only relevant events -6. **Visualize for Debugging**: Use `event.visualize` for rich output - -## Event Lifecycle - -```mermaid -sequenceDiagram - participant User - participant Conversation - participant Agent - participant Events - - User->>Conversation: send_message() - Conversation->>Events: MessageEvent - - Conversation->>Agent: step() - Agent->>Events: ActionEvent(s) - - Agent->>Tool: Execute - Tool->>Events: ObservationEvent(s) - - Events->>LLM: Convert to messages - LLM->>Agent: Generate response - - Agent->>Events: New ActionEvent(s) -``` - -## See Also - -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Managing conversations and event streams -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent execution and event generation -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool actions and observations -- **[Condenser](/sdk/architecture/sdk/condenser.mdx)** - Context condensation events diff --git a/sdk/arch/sdk/llm.mdx b/sdk/arch/sdk/llm.mdx deleted file mode 100644 index 3a418d92..00000000 --- a/sdk/arch/sdk/llm.mdx +++ /dev/null @@ -1,416 +0,0 @@ ---- -title: LLM Integration -description: Language model integration supporting multiple providers through LiteLLM with built-in retry logic and metrics tracking. ---- - -The LLM class provides a unified interface for language model integration, supporting multiple providers through [LiteLLM](https://docs.litellm.ai/). It handles authentication, retries, metrics tracking, and streaming responses. - -**Source**: [`openhands/sdk/llm/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/llm) - -## Core Concepts - -```mermaid -graph LR - LLM[LLM] --> Completion[completion()] - LLM --> Metrics[Metrics Tracking] - LLM --> Retry[Retry Logic] - - Completion --> Provider[Provider API] - Provider --> OpenAI[OpenAI] - Provider --> Anthropic[Anthropic] - Provider --> Others[Other Providers] - - style LLM fill:#e1f5fe - style Completion fill:#fff3e0 - style Metrics fill:#e8f5e8 - style Retry fill:#f3e5f5 -``` - -## Basic Usage - -**Source**: [`openhands/sdk/llm/llm.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/llm.py) - -### Creating an LLM - -```python -from openhands.sdk import LLM -from pydantic import SecretStr - -# Basic configuration -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") -) - -# With custom settings -llm = LLM( - model="openai/gpt-4", - api_key=SecretStr("your-api-key"), - base_url="https://api.openai.com/v1", - temperature=0.7, - max_tokens=4096, - timeout=60.0 -) -``` - -### Configuration Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `model` | `str` | `"claude-sonnet-4-20250514"` | Model identifier | -| `api_key` | `SecretStr \| None` | `None` | API key for authentication | -| `base_url` | `str \| None` | `None` | Custom API endpoint | -| `temperature` | `float` | `0.0` | Sampling temperature (0-2) | -| `max_tokens` | `int \| None` | `None` | Maximum tokens to generate | -| `timeout` | `float` | `60.0` | Request timeout in seconds | -| `num_retries` | `int` | `8` | Number of retry attempts | -| `retry_min_wait` | `int` | `3` | Minimum retry wait (seconds) | -| `retry_max_wait` | `int` | `60` | Maximum retry wait (seconds) | -| `retry_multiplier` | `float` | `2.0` | Retry backoff multiplier | - -## Generating Completions - -### Basic Completion - -```python -from openhands.sdk.llm import Message - -messages = [ - Message(role="user", content="What is the capital of France?") -] - -response = llm.completion(messages=messages) -print(response.choices[0].message.content) -# Output: "The capital of France is Paris." -``` - -### With Tool Calling - -```python -from openhands.sdk import Agent -from openhands.tools import BashTool - -# Tools are automatically converted to function schemas -agent = Agent( - llm=llm, - tools=[BashTool.create()] -) - -# LLM receives tool schemas and can call them -``` - -### Streaming Responses - -```python -# Enable streaming -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - stream=True -) - -# Stream response chunks -for chunk in llm.completion(messages=messages): - if chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") -``` - -## Model Providers - -The SDK supports all providers available in LiteLLM: - -### Anthropic - -```python -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("sk-ant-...") -) -``` - -### OpenAI - -```python -llm = LLM( - model="openai/gpt-4", - api_key=SecretStr("sk-...") -) -``` - -### Azure OpenAI - -```python -llm = LLM( - model="azure/gpt-4", - api_key=SecretStr("your-azure-key"), - api_base="https://your-resource.openai.azure.com", - api_version="2024-02-01" -) -``` - -### Custom Providers - -```python -llm = LLM( - model="custom-provider/model-name", - base_url="https://custom-api.example.com/v1", - api_key=SecretStr("your-api-key") -) -``` - -See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for full list. - -## LLM Registry - -**Source**: Use pre-configured LLM instances from registry. - -See [`examples/01_standalone_sdk/05_use_llm_registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py): - -```python -from openhands.sdk.llm.registry import get_llm - -# Get pre-configured LLM -llm = get_llm( - model_name="claude-sonnet-4", - # Configuration from environment or defaults -) -``` - -## Metrics and Monitoring - -### Tracking Metrics - -**Source**: [`openhands/sdk/llm/utils/metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/utils/metrics.py) - -```python -# Get metrics snapshot -metrics = llm.metrics.snapshot() - -print(f"Total tokens: {metrics.accumulated_cost}") -print(f"Total cost: ${metrics.accumulated_cost}") -print(f"Requests: {metrics.total_requests}") -``` - -See [`examples/01_standalone_sdk/13_get_llm_metrics.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py). - -### Cost Tracking - -```python -from openhands.sdk.conversation import Conversation - -conversation = Conversation(agent=Agent(llm=llm, tools=tools)) -conversation.send_message("Task") -conversation.run() - -# Get conversation stats -stats = conversation.conversation_stats -print(f"Total tokens: {stats.total_tokens}") -print(f"Estimated cost: ${stats.total_cost}") -``` - -## Advanced Features - -### LLM Routing - -**Source**: Route between different LLMs based on criteria. - -See [`examples/01_standalone_sdk/19_llm_routing.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py): - -```python -# Use different LLMs for different tasks -fast_llm = LLM(model="openai/gpt-4o-mini", api_key=SecretStr("...")) -powerful_llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key=SecretStr("...")) - -# Route based on task complexity -if task_is_simple: - agent = Agent(llm=fast_llm, tools=tools) -else: - agent = Agent(llm=powerful_llm, tools=tools) -``` - -### Model Reasoning - -**Source**: Access model reasoning from Anthropic thinking blocks and OpenAI responses API. - -See [`examples/01_standalone_sdk/22_model_reasoning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/22_model_reasoning.py): - -```python -# Enable Anthropic extended thinking -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - thinking={"type": "enabled", "budget_tokens": 1000} -) - -# Or use OpenAI responses API for reasoning -llm = LLM( - model="openai/gpt-5-codex", - api_key=SecretStr("your-api-key"), - reasoning_effort="high" -) -``` - -## Error Handling - -### Automatic Retries - -The LLM class automatically retries on transient failures: - -```python -from litellm.exceptions import RateLimitError, APIConnectionError - -# These exceptions trigger automatic retry: -# - APIConnectionError -# - RateLimitError -# - ServiceUnavailableError -# - Timeout -# - InternalServerError - -# Configure retry behavior -llm = LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key"), - num_retries=8, # Number of retries - retry_min_wait=3, # Min wait between retries (seconds) - retry_max_wait=60, # Max wait between retries (seconds) - retry_multiplier=2.0 # Exponential backoff multiplier -) -``` - -### Exception Handling - -```python -from litellm.exceptions import ( - RateLimitError, - ContextWindowExceededError, - BadRequestError -) - -try: - response = llm.completion(messages=messages) -except RateLimitError: - print("Rate limit exceeded, automatic retry in progress") -except ContextWindowExceededError: - print("Context window exceeded, reduce message history") -except BadRequestError as e: - print(f"Bad request: {e}") -``` - -## Message Types - -**Source**: [`openhands/sdk/llm/message.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py) - -### Text Messages - -```python -from openhands.sdk.llm import Message - -message = Message( - role="user", - content="Hello, how are you?" -) -``` - -### Multimodal Messages - -```python -from openhands.sdk.llm import Message, ImageContent - -message = Message( - role="user", - content=[ - "What's in this image?", - ImageContent(source="path/to/image.png") - ] -) -``` - -See [`examples/01_standalone_sdk/17_image_input.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py). - -### Tool Call Messages - -```python -from openhands.sdk.llm import Message, MessageToolCall - -# Message with tool calls -message = Message( - role="assistant", - content="Let me run that command", - tool_calls=[ - MessageToolCall( - id="call_123", - function={"name": "execute_bash", "arguments": '{"command": "ls"}'} - ) - ] -) -``` - -## Model Features - -### Vision Support - -```python -from litellm.utils import supports_vision - -if supports_vision(llm.model): - # Model supports image inputs - message = Message( - role="user", - content=["Describe this image", ImageContent(source="image.png")] - ) -``` - -### Token Counting - -```python -from litellm.utils import token_counter - -# Count tokens in messages -messages = [Message(role="user", content="Hello world")] -tokens = token_counter(model=llm.model, messages=messages) -print(f"Message uses {tokens} tokens") -``` - -### Model Information - -```python -from litellm.utils import get_model_info - -info = get_model_info(llm.model) -print(f"Max tokens: {info['max_tokens']}") -print(f"Cost per token: {info['input_cost_per_token']}") -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: Adjust based on expected response time -2. **Configure Retries**: Balance reliability with latency requirements -3. **Monitor Metrics**: Track token usage and costs -4. **Handle Exceptions**: Implement proper error handling -5. **Use Streaming**: For better user experience with long responses -6. **Secure API Keys**: Use `SecretStr` and environment variables -7. **Choose Right Model**: Balance cost, speed, and capability - -## Environment Variables - -Configure LLM via environment variables: - -```bash -# API keys -export ANTHROPIC_API_KEY="sk-ant-..." -export OPENAI_API_KEY="sk-..." -export AZURE_API_KEY="..." - -# Custom endpoints -export OPENAI_API_BASE="https://custom-endpoint.com" - -# Model defaults -export LLM_MODEL="anthropic/claude-sonnet-4-20250514" -``` - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Using LLMs with agents -- **[Message Types](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/message.py)** - Message structure -- **[LiteLLM Documentation](https://docs.litellm.ai/)** - Provider details -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/01_standalone_sdk)** - LLM usage examples diff --git a/sdk/arch/sdk/mcp.mdx b/sdk/arch/sdk/mcp.mdx deleted file mode 100644 index ea18a670..00000000 --- a/sdk/arch/sdk/mcp.mdx +++ /dev/null @@ -1,333 +0,0 @@ ---- -title: MCP Integration -description: Connect agents to external tools and services through the Model Context Protocol. ---- - -MCP (Model Context Protocol) integration enables agents to connect to external tools and services through a standardized protocol. The SDK seamlessly converts MCP tools into native agent tools. - -**Source**: [`openhands/sdk/mcp/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/mcp) - -## What is MCP? - -[Model Context Protocol](https://modelcontextprotocol.io/) is an open protocol that standardizes how AI applications connect to external data sources and tools. It enables: - -- **Standardized Integration**: Connect to any MCP-compliant service -- **Dynamic Discovery**: Tools are discovered at runtime -- **Multiple Transports**: Support for stdio, HTTP, and SSE -- **OAuth Support**: Secure authentication for external services - -## Basic Usage - -### Creating MCP Tools - -```python -from openhands.sdk import create_mcp_tools - -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } -} - -# Create MCP tools -mcp_tools = create_mcp_tools(mcp_config, timeout=30) - -# Use with agent -from openhands.sdk import Agent -from openhands.tools import BashTool - -agent = Agent( - llm=llm, - tools=[ - BashTool.create(), - *mcp_tools # Add MCP tools - ] -) -``` - -See [`examples/01_standalone_sdk/07_mcp_integration.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py). - -### Using MCP Config in Agent - -```python -# Simpler: provide MCP config directly to agent -agent = Agent( - llm=llm, - tools=[BashTool.create()], - mcp_config={ - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - } - } - } -) -``` - -## Configuration Formats - -The SDK uses the [FastMCP configuration format](https://gofastmcp.com/clients/client#configuration-format). - -### Stdio Servers - -Run local MCP servers via stdio: - -```python -mcp_config = { - "mcpServers": { - "filesystem": { - "transport": "stdio", # Optional, default - "command": "python", - "args": ["./mcp-server-filesystem.py"], - "env": {"DEBUG": "true"}, - "cwd": "/path/to/server" - } - } -} -``` - -### HTTP/SSE Servers - -Connect to remote MCP servers: - -```python -mcp_config = { - "mcpServers": { - "remote_api": { - "transport": "http", # or "sse" - "url": "https://api.example.com/mcp", - "headers": { - "Authorization": "Bearer token" - } - } - } -} -``` - -### OAuth Authentication - -Authenticate with OAuth-enabled services: - -```python -mcp_config = { - "mcpServers": { - "google_drive": { - "transport": "http", - "url": "https://mcp.google.com/drive", - "auth": "oauth", # Enable OAuth flow - } - } -} -``` - -See [`examples/01_standalone_sdk/08_mcp_with_oauth.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py). - -## Available MCP Servers - -Popular MCP servers you can integrate: - -### Official Servers - -- **fetch**: HTTP requests ([mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)) -- **filesystem**: File operations ([mcp-server-filesystem](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem)) -- **git**: Git operations ([mcp-server-git](https://github.com/modelcontextprotocol/servers/tree/main/src/git)) -- **github**: GitHub API ([mcp-server-github](https://github.com/modelcontextprotocol/servers/tree/main/src/github)) -- **postgres**: PostgreSQL queries ([mcp-server-postgres](https://github.com/modelcontextprotocol/servers/tree/main/src/postgres)) - -### Community Servers - -See [MCP Servers Directory](https://github.com/modelcontextprotocol/servers) for more. - -## MCP Tool Conversion - -MCP tools are automatically converted to SDK tools: - -```mermaid -graph LR - MCPServer[MCP Server] --> Discovery[Tool Discovery] - Discovery --> Schema[Tool Schema] - Schema --> SDKTool[SDK Tool] - SDKTool --> Agent[Agent] - - style MCPServer fill:#e1f5fe - style SDKTool fill:#fff3e0 - style Agent fill:#e8f5e8 -``` - -1. **Discovery**: MCP server lists available tools -2. **Schema Extraction**: Tool schemas extracted from MCP -3. **Tool Creation**: SDK tools created with proper typing -4. **Agent Integration**: Tools available to agent - -## Configuration Options - -### Timeout - -Set connection timeout for MCP servers: - -```python -mcp_tools = create_mcp_tools(mcp_config, timeout=60) # 60 seconds -``` - -### Multiple Servers - -Configure multiple MCP servers: - -```python -mcp_config = { - "mcpServers": { - "fetch": { - "command": "uvx", - "args": ["mcp-server-fetch"] - }, - "filesystem": { - "command": "uvx", - "args": ["mcp-server-filesystem"] - }, - "github": { - "command": "uvx", - "args": ["mcp-server-github"] - } - } -} -``` - -All tools from all servers are available to the agent. - -## Error Handling - -```python -try: - mcp_tools = create_mcp_tools(mcp_config, timeout=30) -except TimeoutError: - print("MCP server connection timed out") -except Exception as e: - print(f"Failed to create MCP tools: {e}") - mcp_tools = [] # Continue without MCP tools - -agent = Agent(llm=llm, tools=[*base_tools, *mcp_tools]) -``` - -## Tool Filtering - -Filter MCP tools using regex: - -```python -agent = Agent( - llm=llm, - tools=tools, - mcp_config=mcp_config, - filter_tools_regex="^fetch_.*" # Only tools starting with "fetch_" -) -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: MCP servers may take time to initialize -2. **Handle Failures Gracefully**: Continue with reduced functionality if MCP fails -3. **Use Official Servers**: Start with well-tested MCP servers -4. **Secure Credentials**: Use environment variables for sensitive data -5. **Test Locally First**: Verify MCP servers work before deploying -6. **Monitor Performance**: MCP adds latency, monitor impact -7. **Version Pin**: Specify exact versions of MCP servers - -## Environment Variables - -Configure MCP servers via environment: - -```bash -# GitHub MCP server -export GITHUB_PERSONAL_ACCESS_TOKEN="ghp_..." - -# Google Drive OAuth -export GOOGLE_CLIENT_ID="..." -export GOOGLE_CLIENT_SECRET="..." - -# Custom MCP endpoints -export MCP_FETCH_URL="https://custom-mcp.example.com" -``` - -## Advanced Usage - -### Custom MCP Client - -For advanced control, use the MCP client directly: - -```python -from openhands.sdk.mcp.client import MCPClient - -# Create custom MCP client -client = MCPClient( - server_config={ - "command": "python", - "args": ["./custom-server.py"] - }, - timeout=60 -) - -# Get tools from client -tools = client.list_tools() - -# Use tools with agent -agent = Agent(llm=llm, tools=tools) -``` - -## Debugging - -### Enable Debug Logging - -```python -import logging - -logging.getLogger("openhands.sdk.mcp").setLevel(logging.DEBUG) -``` - -### Verify MCP Server - -Test MCP server independently: - -```bash -# Run MCP server directly -uvx mcp-server-fetch - -# Check if server responds -curl http://localhost:3000/mcp/tools -``` - -## Common Issues - -### Server Not Found - -```python -# Ensure server is installed -# For uvx-based servers: -uvx --help # Check if uvx is available -uvx mcp-server-fetch --help # Check if server is available -``` - -### Connection Timeout - -```python -# Increase timeout -mcp_tools = create_mcp_tools(mcp_config, timeout=120) -``` - -### OAuth Flow Issues - -```python -# Ensure OAuth credentials are configured -# Check browser opens for OAuth consent -# Verify redirect URL matches configuration -``` - -## See Also - -- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Official MCP documentation -- **[MCP Servers](https://github.com/modelcontextprotocol/servers)** - Official server implementations -- **[FastMCP](https://gofastmcp.com/)** - Configuration format documentation -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - SDK tool system -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py)** - MCP integration examples diff --git a/sdk/arch/sdk/microagents.mdx b/sdk/arch/sdk/microagents.mdx deleted file mode 100644 index 00c95dd8..00000000 --- a/sdk/arch/sdk/microagents.mdx +++ /dev/null @@ -1,225 +0,0 @@ ---- -title: Microagents -description: Specialized context providers that inject targeted knowledge into agent conversations. ---- - -Microagents are specialized context providers that inject targeted knowledge into agent conversations when specific triggers are detected. They enable modular, reusable expertise without modifying the main agent. - -## What are Microagents? - -Microagents provide focused knowledge or instructions that are dynamically added to the agent's context when relevant keywords are detected in the conversation. This allows agents to access specialized expertise on-demand. - -For a comprehensive guide on using microagents, see the [official microagents documentation](https://docs.all-hands.dev/usage/prompting/microagents-overview). - -**Source**: [`openhands/sdk/context/microagents/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/context/microagents) - -## Microagent Types - -**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) - -The SDK provides three types of microagents, each serving a distinct purpose: - -### 1. KnowledgeMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L162`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L162) - -Provides specialized expertise triggered by keywords in conversations. - -**Activation Logic:** -- Contains a list of trigger keywords -- Activated when any trigger appears in conversation -- Case-insensitive matching - -**Use Cases:** -- Language best practices (Python, JavaScript, etc.) -- Framework guidelines (React, Django, etc.) -- Common patterns and anti-patterns -- Tool usage instructions - -**Example:** -```python -from openhands.sdk.context.microagents import KnowledgeMicroagent - -microagent = KnowledgeMicroagent( - name="python_testing", - content="Always use pytest for Python tests...", - triggers=["pytest", "test", "unittest"] -) - -# Triggered when message contains "pytest", "test", or "unittest" -``` - -### 2. RepoMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L191`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L191) - -Repository-specific knowledge that's always active when working with a repository. - -**Activation Logic:** -- No triggers required -- Always loaded and active for the repository -- Can define MCP tools configuration - -**Use Cases:** -- Repository-specific guidelines -- Team practices and conventions -- Project-specific workflows -- Custom documentation references - -**Special Files:** -- `.openhands_instructions` - Legacy repo instructions -- `.cursorrules` - Cursor IDE rules (auto-loaded) -- `agents.md` / `agent.md` - Agent instructions (auto-loaded) - -**Example:** -```python -from openhands.sdk.context.microagents import RepoMicroagent - -microagent = RepoMicroagent( - name="project_guidelines", - content="This project uses...", - mcp_tools={"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]}} -) -``` - -### 3. TaskMicroagent - -**Source**: [`openhands/sdk/context/microagents/microagent.py#L236`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py#L236) - -Specialized KnowledgeMicroagent that requires user input before execution. - -**Activation Logic:** -- Triggered by `/{agent_name}` format -- Prompts user for required inputs -- Processes inputs before injecting knowledge - -**Use Cases:** -- Deployment procedures requiring credentials -- Multi-step workflows with parameters -- Interactive debugging sessions -- Customized task execution - -**Example:** -```python -from openhands.sdk.context.microagents import TaskMicroagent, InputMetadata - -microagent = TaskMicroagent( - name="deploy", - content="Deploy to {environment} with {version}...", - triggers=["/deploy"], - inputs=[ - InputMetadata(name="environment", type="string", required=True), - InputMetadata(name="version", type="string", required=True) - ] -) - -# User types: "/deploy" -# Agent prompts: "Enter environment:" "Enter version:" -# Agent proceeds with filled template -``` - -## How Microagents Work - -```mermaid -sequenceDiagram - participant User - participant Agent - participant Microagent - participant LLM - - User->>Agent: "Run the tests" - Agent->>Agent: Detect keyword "tests" - Agent->>Microagent: Fetch testing microagent - Microagent->>Agent: Return testing guidelines - Agent->>LLM: Context + guidelines - LLM->>Agent: Response with testing knowledge - Agent->>User: Execute tests with guidelines -``` - -## Using Microagents - -### Basic Usage - -```python -from openhands.sdk import Agent, AgentContext - -# Create context with microagents -context = AgentContext( - microagents=["testing_expert", "code_reviewer"] -) - -# Create agent with microagents -agent = Agent( - llm=llm, - tools=tools, - agent_context=context -) -``` - -### Example Integration - -See [`examples/01_standalone_sdk/03_activate_microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py) for a complete example. - -## Microagent Structure - -**Source**: [`openhands/sdk/context/microagents/microagent.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/microagent.py) - -A microagent consists of: -- **Name**: Unique identifier -- **Triggers**: Keywords that activate the microagent -- **Content**: Knowledge or instructions to inject -- **Type**: One of "knowledge", "repo", or "task" - -## Response Models - -**Source**: [`openhands/sdk/context/microagents/types.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/context/microagents/types.py) - -### MicroagentKnowledge - -```python -class MicroagentKnowledge(BaseModel): - name: str # Microagent name - trigger: str # Keyword that triggered it - content: str # Injected content -``` - -### MicroagentResponse - -```python -class MicroagentResponse(BaseModel): - name: str # Microagent name - path: str # Path or identifier - created_at: datetime # Creation timestamp -``` - -### MicroagentContentResponse - -```python -class MicroagentContentResponse(BaseModel): - content: str # Full microagent content - path: str # Path or identifier - triggers: list[str] # Trigger keywords - git_provider: str | None # Git source if applicable -``` - -## Benefits - -1. **Modularity**: Separate specialized knowledge from main agent logic -2. **Reusability**: Share microagents across multiple agents -3. **Maintainability**: Update expertise without modifying agent code -4. **Context-Aware**: Only inject relevant knowledge when needed -5. **Composability**: Combine multiple microagents for comprehensive coverage - -## Best Practices - -1. **Clear Triggers**: Use specific, unambiguous trigger keywords -2. **Focused Content**: Keep microagent content concise and targeted -3. **Avoid Overlap**: Minimize trigger conflicts between microagents -4. **Version Control**: Store microagents in version-controlled repositories -5. **Documentation**: Document trigger keywords and intended use cases - -## See Also - -- **[Official Microagents Guide](https://docs.all-hands.dev/usage/prompting/microagents-overview)** - Comprehensive documentation -- **[Agent Context](/sdk/architecture/sdk/agent.mdx)** - Using context with agents -- **[Example Code](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_microagent.py)** - Working example diff --git a/sdk/arch/sdk/security.mdx b/sdk/arch/sdk/security.mdx deleted file mode 100644 index a41264fc..00000000 --- a/sdk/arch/sdk/security.mdx +++ /dev/null @@ -1,416 +0,0 @@ ---- -title: Security -description: Analyze and control agent actions through security analyzers and confirmation policies. ---- - -The security system enables control over agent actions through risk analysis and confirmation policies. It helps prevent dangerous operations while maintaining agent autonomy for safe actions. - -**Source**: [`openhands/sdk/security/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/security) - -## Core Concepts - -```mermaid -graph TD - Action[Agent Action] --> Analyzer[Security Analyzer] - Analyzer --> Risk[Risk Assessment] - Risk --> Policy[Confirmation Policy] - - Policy --> Low{Risk Level} - Low -->|Low| Execute[Execute] - Low -->|Medium| MaybeConfirm[Policy Decision] - Low -->|High| Confirm[Request Confirmation] - - Confirm --> User[User Decision] - User -->|Approve| Execute - User -->|Reject| Block[Block Action] - - style Action fill:#e1f5fe - style Analyzer fill:#fff3e0 - style Policy fill:#e8f5e8 - style Execute fill:#c8e6c9 - style Block fill:#ffcdd2 -``` - -The security system consists of two components: -- **Security Analyzer**: Assesses risk level of actions -- **Confirmation Policy**: Decides when to require user confirmation - -## Security Analyzer - -### LLM Security Analyzer - -**Source**: [`openhands/sdk/security/llm_analyzer.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/llm_analyzer.py) - -Uses an LLM to analyze action safety: - -```python -from openhands.sdk.security import LLMSecurityAnalyzer -from openhands.sdk import Agent, LLM -from pydantic import SecretStr - -# Create security analyzer -security_analyzer = LLMSecurityAnalyzer( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ) -) - -# Create agent with security analyzer -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer -) -``` - -### Risk Levels - -**Source**: [`openhands/sdk/security/risk.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/risk.py) - -```python -from openhands.sdk.security.risk import SecurityRisk - -# Risk levels -SecurityRisk.LOW # Safe operations (read files, list directories) -SecurityRisk.MEDIUM # Potentially impactful (write files, API calls) -SecurityRisk.HIGH # Dangerous operations (delete files, system changes) -``` - -### How LLM Analyzer Works - -1. **Action Inspection**: Examines the action and its parameters -2. **Context Analysis**: Considers conversation history and workspace -3. **Risk Assessment**: LLM predicts risk level with reasoning -4. **Risk Return**: Returns risk level and explanation - -```python -# Example internal flow -action = BashAction(command="rm -rf /") -risk = security_analyzer.analyze(action, context) -# Returns: SecurityRisk.HIGH, "Attempting to delete entire filesystem" -``` - -### Custom Security Analyzer - -Implement custom risk analysis: - -```python -from openhands.sdk.security.analyzer import SecurityAnalyzerBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.tool import Action - -class PatternBasedAnalyzer(SecurityAnalyzerBase): - dangerous_patterns = ["rm -rf", "sudo", "DROP TABLE"] - - def analyze( - self, - action: Action, - context: dict - ) -> tuple[SecurityRisk, str]: - command = getattr(action, "command", "") - - for pattern in self.dangerous_patterns: - if pattern in command: - return ( - SecurityRisk.HIGH, - f"Dangerous pattern detected: {pattern}" - ) - - return SecurityRisk.LOW, "No dangerous patterns found" -``` - -## Confirmation Policies - -**Source**: [`openhands/sdk/security/confirmation_policy.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/security/confirmation_policy.py) - -### Built-in Policies - -#### NeverConfirm - -Never request confirmation (default): - -```python -from openhands.sdk.security import NeverConfirm - -agent = Agent( - llm=llm, - tools=tools, - confirmation_policy=NeverConfirm() -) -``` - -#### AlwaysConfirm - -Always request confirmation: - -```python -from openhands.sdk.security import AlwaysConfirm - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=AlwaysConfirm() -) -``` - -#### ConfirmOnHighRisk - -Confirm only high-risk actions: - -```python -from openhands.sdk.security import ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnHighRisk() -) -``` - -#### ConfirmOnMediumOrHighRisk - -Confirm medium and high-risk actions: - -```python -from openhands.sdk.security import ConfirmOnMediumOrHighRisk - -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnMediumOrHighRisk() -) -``` - -### Custom Confirmation Policy - -Implement custom confirmation logic: - -```python -from openhands.sdk.security.confirmation_policy import ConfirmationPolicyBase -from openhands.sdk.security.risk import SecurityRisk -from openhands.sdk.tool import Action - -class TimeBasedPolicy(ConfirmationPolicyBase): - """Require confirmation during business hours.""" - - def should_confirm( - self, - action: Action, - risk: SecurityRisk, - context: dict - ) -> bool: - from datetime import datetime - - hour = datetime.now().hour - - # Business hours: always confirm high risk - if 9 <= hour <= 17: - return risk >= SecurityRisk.HIGH - - # Off hours: confirm medium and high risk - return risk >= SecurityRisk.MEDIUM -``` - -## Using Security System - -### Basic Setup - -```python -from openhands.sdk import Agent, LLM, Conversation -from openhands.sdk.security import ( - LLMSecurityAnalyzer, - ConfirmOnHighRisk -) -from pydantic import SecretStr - -# Create analyzer -security_analyzer = LLMSecurityAnalyzer( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ) -) - -# Create agent with security -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, - confirmation_policy=ConfirmOnHighRisk() -) - -# Use in conversation -conversation = Conversation(agent=agent) -``` - -See [`examples/01_standalone_sdk/04_human_in_the_loop.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py). - -### Handling Confirmations - -```python -from openhands.sdk import Conversation -from openhands.sdk.conversation.state import AgentExecutionStatus - -conversation = Conversation(agent=agent) -conversation.send_message("Delete all temporary files") - -# Run agent -conversation.run() - -# Check if waiting for confirmation -if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: - print("Action requires confirmation:") - # Show pending action details - - # User approves - conversation.confirm_pending_action() - conversation.run() - - # Or user rejects - # conversation.reject_pending_action(reason="Too risky") -``` - -### Dynamic Policy Changes - -Change confirmation policy during execution: - -```python -from openhands.sdk.security import AlwaysConfirm, NeverConfirm - -conversation = Conversation(agent=agent) - -# Start with strict policy -conversation.set_confirmation_policy(AlwaysConfirm()) -conversation.send_message("Sensitive task") -conversation.run() - -# Switch to permissive policy -conversation.set_confirmation_policy(NeverConfirm()) -conversation.send_message("Safe task") -conversation.run() -``` - -## Security Workflow - -```mermaid -sequenceDiagram - participant Agent - participant Analyzer - participant Policy - participant User - participant Tool - - Agent->>Analyzer: analyze(action) - Analyzer->>Analyzer: Assess risk - Analyzer->>Agent: risk + explanation - - Agent->>Policy: should_confirm(action, risk) - Policy->>Policy: Apply policy rules - - alt No confirmation needed - Policy->>Agent: execute - Agent->>Tool: Execute action - Tool->>Agent: Observation - else Confirmation required - Policy->>User: Request approval - User->>Policy: Approve/Reject - alt Approved - Policy->>Agent: execute - Agent->>Tool: Execute action - else Rejected - Policy->>Agent: block - Agent->>Agent: UserRejectObservation - end - end -``` - -## Best Practices - -1. **Use LLM Analyzer**: Provides nuanced risk assessment -2. **Start Conservative**: Begin with strict policies, relax as needed -3. **Monitor Blocked Actions**: Review what's being blocked -4. **Provide Context**: Better context enables better risk assessment -5. **Test Security Setup**: Verify policies work as expected -6. **Document Policies**: Explain confirmation requirements to users -7. **Handle Rejections**: Implement proper error handling for rejected actions - -## Performance Considerations - -### LLM Analyzer Overhead - -LLM security analysis adds latency: -- **Cost**: Additional LLM call per action -- **Latency**: ~1-2 seconds per analysis -- **Tokens**: ~500-1000 tokens per analysis - -```python -# Only use with confirmation policy -agent = Agent( - llm=llm, - tools=tools, - security_analyzer=security_analyzer, # Costs tokens - confirmation_policy=ConfirmOnHighRisk() # Must be used together -) -``` - -### Optimization Tips - -1. **Cache Similar Actions**: Reuse assessments for similar actions -2. **Use Faster Models**: Consider faster LLMs for security analysis -3. **Pattern-Based Pre-Filter**: Use pattern matching before LLM analysis -4. **Batch Analysis**: Analyze multiple actions together when possible - -## Security Best Practices - -### Principle of Least Privilege - -```python -# Provide only necessary tools -agent = Agent( - llm=llm, - tools=[ - FileEditorTool.create(), # Safe file operations - # Don't include BashTool for untrusted tasks - ] -) -``` - -### Sandbox Execution - -```python -# Use DockerWorkspace for isolation -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -conversation = Conversation(agent=agent, workspace=workspace) -``` - -### Secrets Management - -```python -# Provide secrets securely -conversation = Conversation( - agent=agent, - secrets={ - "API_KEY": "secret-value", - "PASSWORD": "secure-password" - } -) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -## See Also - -- **[Agent](/sdk/architecture/sdk/agent.mdx)** - Agent configuration with security -- **[Conversation](/sdk/architecture/sdk/conversation.mdx)** - Handling confirmations -- **[Tools](/sdk/architecture/sdk/tool.mdx)** - Tool security considerations -- **[Human-in-the-Loop Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/04_human_in_the_loop.py)** - Complete example diff --git a/sdk/arch/sdk/tool.mdx b/sdk/arch/sdk/tool.mdx deleted file mode 100644 index 3bbe737e..00000000 --- a/sdk/arch/sdk/tool.mdx +++ /dev/null @@ -1,199 +0,0 @@ ---- -title: Tool System -description: Define custom tools for agents to interact with external systems through typed action/observation patterns. ---- - -The tool system enables agents to interact with external systems and perform actions. Tools follow a typed action/observation pattern with comprehensive validation and schema generation. - -## Core Concepts - -```mermaid -graph LR - Action[Action] --> Tool[Tool] - Tool --> Executor[ToolExecutor] - Executor --> Observation[Observation] - - style Action fill:#e1f5fe - style Tool fill:#f3e5f5 - style Executor fill:#fff3e0 - style Observation fill:#e8f5e8 -``` - -A tool consists of three components: -- **Action**: Input schema defining tool parameters -- **ToolExecutor**: Logic that executes the tool -- **Observation**: Output schema with execution results - -**Source**: [`openhands/sdk/tool/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool) - -## Defining Custom Tools - -### 1. Define Action and Observation - -**Source**: [`openhands/sdk/tool/schema.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/schema.py) - -```python -from openhands.sdk.tool import Action, Observation - -class CalculateAction(Action): - """Action to perform calculation.""" - expression: str - precision: int = 2 - -class CalculateObservation(Observation): - """Result of calculation.""" - result: float - success: bool -``` - -### 2. Implement ToolExecutor - -**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) - -```python -from openhands.sdk.tool import ToolExecutor - -class CalculateExecutor(ToolExecutor[CalculateAction, CalculateObservation]): - def __call__(self, action: CalculateAction) -> CalculateObservation: - try: - result = eval(action.expression) - return CalculateObservation( - result=round(result, action.precision), - success=True - ) - except Exception as e: - return CalculateObservation( - result=0.0, - success=False, - error=str(e) - ) -``` - -### 3. Create Tool Class - -```python -from openhands.sdk.tool import Tool - -class CalculateTool(Tool[CalculateAction, CalculateObservation]): - name: str = "calculate" - description: str = "Evaluate mathematical expressions" - action_type: type[Action] = CalculateAction - observation_type: type[Observation] = CalculateObservation - - @classmethod - def create(cls) -> list["CalculateTool"]: - executor = CalculateExecutor() - return [cls().set_executor(executor)] -``` - -### Complete Example - -See [`examples/01_standalone_sdk/02_custom_tools.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) for a working example. - -## Built-in Tools - -**Source**: [`openhands/sdk/tool/builtins/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/tool/builtins) - -### FinishTool - -**Source**: [`openhands/sdk/tool/builtins/finish.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/finish.py) - -Signals task completion with optional output. - -```python -from openhands.sdk.tool.builtins import FinishTool - -# Automatically included with agents -finish_tool = FinishTool.create() -``` - -### ThinkTool - -**Source**: [`openhands/sdk/tool/builtins/think.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/builtins/think.py) - -Enables internal reasoning without external actions. - -```python -from openhands.sdk.tool.builtins import ThinkTool - -# Automatically included with agents -think_tool = ThinkTool.create() -``` - -## Tool Annotations - -**Source**: [`openhands/sdk/tool/tool.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/tool.py) - -Provide hints about tool behavior following [MCP spec](https://modelcontextprotocol.io/): - -```python -from openhands.sdk.tool import ToolAnnotations - -annotations = ToolAnnotations( - title="Calculate", - readOnlyHint=True, # Tool doesn't modify environment - destructiveHint=False, # Tool doesn't perform destructive updates - idempotentHint=True, # Same input produces same output - openWorldHint=False # Tool doesn't interact with external entities -) - -class CalculateTool(Tool[CalculateAction, CalculateObservation]): - annotations: ToolAnnotations = annotations - # ... rest of tool definition -``` - -## Tool Registry - -**Source**: [`openhands/sdk/tool/registry.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/tool/registry.py) - -Tools are automatically registered when defined. The registry manages tool discovery and schema generation for LLM function calling. - -## Advanced Patterns - -### Stateful Executors - -Executors can maintain state across executions: - -```python -class DatabaseExecutor(ToolExecutor[QueryAction, QueryObservation]): - def __init__(self, connection_string: str): - self.connection = connect(connection_string) - - def __call__(self, action: QueryAction) -> QueryObservation: - result = self.connection.execute(action.query) - return QueryObservation(rows=result.fetchall()) - - def close(self) -> None: - """Clean up resources.""" - self.connection.close() -``` - -### Dynamic Tool Creation - -Create tools with runtime configuration: - -```python -class ConfigurableTool(Tool[MyAction, MyObservation]): - @classmethod - def create(cls, api_key: str, endpoint: str) -> list["ConfigurableTool"]: - executor = MyExecutor(api_key=api_key, endpoint=endpoint) - return [cls().set_executor(executor)] - -# Use with different configurations -tool1 = ConfigurableTool.create(api_key="key1", endpoint="https://api1.com") -tool2 = ConfigurableTool.create(api_key="key2", endpoint="https://api2.com") -``` - -## Best Practices - -1. **Type Safety**: Use Pydantic models for actions and observations -2. **Error Handling**: Always handle exceptions in executors -3. **Resource Management**: Implement `close()` for cleanup -4. **Clear Descriptions**: Provide detailed docstrings for LLM understanding -5. **Validation**: Leverage Pydantic validators for input validation - -## See Also - -- **[Pre-defined Tools](/sdk/architecture/tools/)** - Ready-to-use tool implementations -- **[MCP Integration](/sdk/architecture/sdk/mcp.mdx)** - Connect to external MCP tools -- **[Agent Usage](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents diff --git a/sdk/arch/sdk/workspace.mdx b/sdk/arch/sdk/workspace.mdx deleted file mode 100644 index 42d61900..00000000 --- a/sdk/arch/sdk/workspace.mdx +++ /dev/null @@ -1,322 +0,0 @@ ---- -title: Workspace Interface -description: Abstract interface for agent execution environments supporting local and remote operations. ---- - -The workspace interface defines how agents interact with their execution environment. It provides a unified API for file operations and command execution, supporting both local and remote environments. - -**Source**: [`openhands/sdk/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace) - -## Core Concepts - -```mermaid -graph TD - BaseWorkspace[BaseWorkspace] --> Local[LocalWorkspace] - BaseWorkspace --> Remote[RemoteWorkspace] - - Local --> FileOps[File Operations] - Local --> CmdExec[Command Execution] - - Remote --> Docker[DockerWorkspace] - Remote --> API[RemoteAPIWorkspace] - - style BaseWorkspace fill:#e1f5fe - style Local fill:#e8f5e8 - style Remote fill:#fff3e0 -``` - -A workspace provides: -- **File Operations**: Upload, download, read, write -- **Command Execution**: Run bash commands with timeout support -- **Resource Management**: Context manager protocol for cleanup -- **Flexibility**: Local development or remote sandboxed execution - -## Base Interface - -**Source**: [`openhands/sdk/workspace/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/base.py) - -### BaseWorkspace - -Abstract base class defining the workspace interface: - -```python -from openhands.sdk.workspace import BaseWorkspace - -class CustomWorkspace(BaseWorkspace): - working_dir: str # Required: working directory path - - def execute_command( - self, - command: str, - cwd: str | None = None, - timeout: float = 30.0 - ) -> CommandResult: - """Execute bash command.""" - ... - - def file_upload( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - """Upload file to workspace.""" - ... - - def file_download( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - """Download file from workspace.""" - ... -``` - -### Context Manager Protocol - -All workspaces support the context manager protocol for safe resource management: - -```python -with workspace: - result = workspace.execute_command("echo 'hello'") - # Workspace automatically cleans up on exit -``` - -## LocalWorkspace - -**Source**: [`openhands/sdk/workspace/local.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/local.py) - -Executes operations directly on the local machine. - -```python -from openhands.sdk.workspace import LocalWorkspace - -workspace = LocalWorkspace(working_dir="/path/to/project") - -# Execute command -result = workspace.execute_command("ls -la") -print(result.stdout) - -# Upload file (copy) -workspace.file_upload("local_file.txt", "workspace_file.txt") - -# Download file (copy) -workspace.file_download("workspace_file.txt", "local_copy.txt") -``` - -**Use Cases**: -- Local development and testing -- Direct file system access -- No sandboxing required -- Fast execution without network overhead - -## RemoteWorkspace - -**Source**: [`openhands/sdk/workspace/remote/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/sdk/workspace/remote) - -Abstract base for remote execution environments. - -### RemoteWorkspace Mixin - -**Source**: [`openhands/sdk/workspace/remote/base.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/base.py) - -Provides common functionality for remote workspaces: -- Network communication -- File transfer protocols -- Command execution over API -- Resource cleanup - -### AsyncRemoteWorkspace - -**Source**: [`openhands/sdk/workspace/remote/async_remote_workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/remote/async_remote_workspace.py) - -Async version for concurrent operations. - -## Concrete Remote Implementations - -Remote workspace implementations are provided in the `workspace` package: - -### DockerWorkspace - -**Source**: See [workspace/docker documentation](/sdk/architecture/workspace/docker.mdx) - -Executes operations in an isolated Docker container. - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04", - container_name="agent-sandbox" -) - -with workspace: - result = workspace.execute_command("python script.py") -``` - -**Benefits**: -- Strong isolation and sandboxing -- Reproducible environments -- Resource limits and security -- Clean slate for each session - -### RemoteAPIWorkspace - -**Source**: See [workspace/remote_api documentation](/sdk/architecture/workspace/remote_api.mdx) - -Connects to a remote agent server via API. - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("npm test") -``` - -**Benefits**: -- Centralized agent execution -- Shared resources and caching -- Scalable architecture -- Remote monitoring and logging - -## Result Models - -**Source**: [`openhands/sdk/workspace/models.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/workspace/models.py) - -### CommandResult - -```python -class CommandResult(BaseModel): - stdout: str # Standard output - stderr: str # Standard error - exit_code: int # Exit code (0 = success) - duration: float # Execution time in seconds -``` - -### FileOperationResult - -```python -class FileOperationResult(BaseModel): - success: bool # Operation success status - message: str # Status message - path: str # File path -``` - -## Usage with Agents - -Workspaces integrate with agents through tools: - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from openhands.sdk.workspace import LocalWorkspace - -# Create workspace -workspace = LocalWorkspace(working_dir="/project") - -# Create tools with workspace -tools = [ - BashTool.create(working_dir=workspace.working_dir), - FileEditorTool.create() -] - -# Create agent -agent = Agent(llm=llm, tools=tools) -``` - -## Local vs Remote Comparison - -| Feature | LocalWorkspace | RemoteWorkspace | -|---------|---------------|-----------------| -| **Execution** | Local machine | Remote server/container | -| **Isolation** | None | Strong (Docker/API) | -| **Performance** | Fast | Network latency | -| **Security** | Host system | Sandboxed environment | -| **Setup** | Simple | Requires infrastructure | -| **Use Case** | Development | Production/Multi-user | - -## Advanced Usage - -### Custom Workspace Implementation - -```python -from openhands.sdk.workspace import BaseWorkspace -from openhands.sdk.workspace.models import CommandResult, FileOperationResult - -class CloudWorkspace(BaseWorkspace): - working_dir: str - cloud_instance_id: str - - def execute_command( - self, - command: str, - cwd: str | None = None, - timeout: float = 30.0 - ) -> CommandResult: - # Execute on cloud instance - response = self.cloud_api.run_command( - instance_id=self.cloud_instance_id, - command=command - ) - return CommandResult( - stdout=response.stdout, - stderr=response.stderr, - exit_code=response.exit_code, - duration=response.duration - ) - - def file_upload( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - # Upload to cloud storage - ... - - def file_download( - self, - source_path: str, - destination_path: str - ) -> FileOperationResult: - # Download from cloud storage - ... -``` - -### Error Handling - -```python -from openhands.sdk.workspace import LocalWorkspace - -workspace = LocalWorkspace(working_dir="/project") - -try: - result = workspace.execute_command("risky_command", timeout=60.0) - if result.exit_code != 0: - print(f"Command failed: {result.stderr}") -except TimeoutError: - print("Command timed out") -except Exception as e: - print(f"Execution error: {e}") -``` - -## Best Practices - -1. **Use Context Managers**: Always use `with` statements for proper cleanup -2. **Set Appropriate Timeouts**: Prevent hanging on long-running commands -3. **Validate Working Directory**: Ensure paths exist before operations -4. **Handle Errors**: Check exit codes and handle exceptions -5. **Choose Right Workspace**: Local for development, remote for production -6. **Resource Limits**: Set appropriate resource limits for remote workspaces - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker-based sandboxing -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Remote agent execution server -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace usage examples diff --git a/sdk/arch/tools/bash.mdx b/sdk/arch/tools/bash.mdx deleted file mode 100644 index 3497307c..00000000 --- a/sdk/arch/tools/bash.mdx +++ /dev/null @@ -1,288 +0,0 @@ ---- -title: BashTool -description: Execute bash commands with persistent session support, timeout control, and environment management. ---- - -BashTool enables agents to execute bash commands in a persistent session with full control over working directory, environment variables, and execution timeout. - -**Source**: [`openhands/tools/execute_bash/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash) - -## Overview - -BashTool provides: -- Persistent bash session across multiple commands -- Environment variable management -- Timeout control for long-running commands -- Working directory configuration -- Support for both local and remote execution - -## Usage - -### Basic Usage - -```python -from openhands.tools import BashTool - -# Create tool -bash_tool = BashTool.create() - -# Use with agent -from openhands.sdk import Agent - -agent = Agent( - llm=llm, - tools=[bash_tool] -) -``` - -### With Configuration - -```python -bash_tool = BashTool.create( - working_dir="/project/path", - timeout=60.0 # 60 seconds -) -``` - -## Action Model - -**Source**: [`openhands/tools/execute_bash/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/definition.py) - -```python -class BashAction(Action): - command: str # Bash command to execute - thought: str = "" # Optional reasoning -``` - -### Example - -```python -from openhands.tools import BashAction - -action = BashAction( - command="ls -la", - thought="List files to understand directory structure" -) -``` - -## Observation Model - -```python -class BashObservation(Observation): - output: str # Command output (stdout + stderr) - exit_code: int # Exit code (0 = success) -``` - -### Example - -```python -# Successful execution -observation = BashObservation( - output="file1.txt\nfile2.py\n", - exit_code=0 -) - -# Failed execution -observation = BashObservation( - output="command not found: invalid_cmd\n", - exit_code=127 -) -``` - -## Features - -### Persistent Session - -Commands execute in the same bash session, preserving: -- Environment variables -- Working directory changes -- Shell state - -```python -# Set environment variable -agent.run("export API_KEY=secret") - -# Use in next command -agent.run("echo $API_KEY") # Outputs: secret -``` - -### Terminal Types - -**Source**: [`openhands/tools/execute_bash/terminal/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/execute_bash/terminal) - -BashTool supports multiple terminal implementations: - -- **SubprocessTerminal**: Direct subprocess execution (default) -- **TmuxTerminal**: Tmux-based persistent sessions - -### Timeout Control - -Commands automatically timeout after the specified duration: - -```python -bash_tool = BashTool.create(timeout=30.0) # 30 second timeout - -# Long-running command will be terminated -action = BashAction(command="sleep 60") # Timeout after 30s -``` - -### Environment Management - -Set custom environment variables: - -```python -# Via workspace secrets -from openhands.sdk import Conversation - -conversation = Conversation( - agent=agent, - secrets={ - "DATABASE_URL": "postgres://...", - "API_KEY": "secret" - } -) -``` - -See [`examples/01_standalone_sdk/12_custom_secrets.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py). - -## Common Use Cases - -### File Operations - -```python -# Create directory -BashAction(command="mkdir -p /path/to/dir") - -# Copy files -BashAction(command="cp source.txt dest.txt") - -# Find files -BashAction(command="find . -name '*.py'") -``` - -### Build and Test - -```python -# Install dependencies -BashAction(command="pip install -r requirements.txt") - -# Run tests -BashAction(command="pytest tests/") - -# Build project -BashAction(command="npm run build") -``` - -### Git Operations - -```python -# Clone repository -BashAction(command="git clone https://github.com/user/repo.git") - -# Create branch -BashAction(command="git checkout -b feature-branch") - -# Commit changes -BashAction(command='git commit -m "Add feature"') -``` - -### System Information - -```python -# Check disk space -BashAction(command="df -h") - -# List processes -BashAction(command="ps aux") - -# Network information -BashAction(command="ifconfig") -``` - -## Best Practices - -1. **Set Appropriate Timeouts**: Prevent hanging on long commands -2. **Use Absolute Paths**: Or configure working directory explicitly -3. **Check Exit Codes**: Verify command success in agent logic -4. **Escape Special Characters**: Properly quote arguments -5. **Avoid Interactive Commands**: BashTool works best with non-interactive commands -6. **Use Security Analysis**: Enable for sensitive operations - -## Security Considerations - -### Risk Assessment - -BashTool actions have varying risk levels: - -- **LOW**: Read operations (`ls`, `cat`, `grep`) -- **MEDIUM**: Write operations (`touch`, `mkdir`, `echo >`) -- **HIGH**: Destructive operations (`rm -rf`, `sudo`, `chmod`) - -### Enable Security - -```python -from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=[BashTool.create()], - security_analyzer=LLMSecurityAnalyzer(llm=llm), - confirmation_policy=ConfirmOnHighRisk() -) -``` - -### Sandboxing - -Use DockerWorkspace for isolation: - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -conversation = Conversation(agent=agent, workspace=workspace) -``` - -## Error Handling - -### Common Exit Codes - -- `0`: Success -- `1`: General error -- `2`: Misuse of shell builtin -- `126`: Command not executable -- `127`: Command not found -- `130`: Terminated by Ctrl+C -- `137`: Killed by SIGKILL (timeout) - -### Handling Failures - -```python -# Agent can check observation -if observation.exit_code != 0: - # Handle error based on output - if "permission denied" in observation.output.lower(): - # Retry with different approach - pass -``` - -## Implementation Details - -**Source**: [`openhands/tools/execute_bash/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/execute_bash/impl.py) - -The tool uses a terminal interface that: -1. Initializes a persistent bash session -2. Executes commands with timeout support -3. Captures stdout and stderr -4. Returns exit codes -5. Handles session cleanup - -## See Also - -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For file manipulation -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/browser_use.mdx b/sdk/arch/tools/browser_use.mdx deleted file mode 100644 index bd52db73..00000000 --- a/sdk/arch/tools/browser_use.mdx +++ /dev/null @@ -1,101 +0,0 @@ ---- -title: BrowserUseTool -description: Web browsing and interaction capabilities powered by browser-use integration. ---- - -BrowserUseTool enables agents to interact with web pages, navigate websites, and extract web content through an integrated browser. - -**Source**: [`openhands/tools/browser_use/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/browser_use) - -## Overview - -BrowserUseTool provides: -- Web page navigation -- Element interaction (click, type, etc.) -- Content extraction -- Screenshot capture -- JavaScript execution - -## Usage - -```python -from openhands.tools import BrowserUseTool - -agent = Agent(llm=llm, tools=[BrowserUseTool.create()]) -``` - -## Features - -### Web Navigation - -- Navigate to URLs -- Follow links -- Browser back/forward -- Page refresh - -### Element Interaction - -- Click elements -- Fill forms -- Submit data -- Select dropdowns - -### Content Extraction - -- Extract text content -- Get element attributes -- Capture screenshots -- Parse structured data - -## Use Cases - -### Web Scraping - -```python -# Navigate to page and extract data -# Agent can use browser to: -# 1. Navigate to target URL -# 2. Wait for content to load -# 3. Extract desired information -# 4. Return structured data -``` - -### Web Testing - -```python -# Test web applications -# Agent can: -# 1. Navigate to application -# 2. Fill out forms -# 3. Click buttons -# 4. Verify expected behavior -``` - -### Research - -```python -# Research information online -# Agent can: -# 1. Search for information -# 2. Navigate search results -# 3. Extract relevant content -# 4. Synthesize findings -``` - -## Integration - -BrowserUseTool is powered by the [browser-use](https://github.com/browser-use/browser-use) library, providing robust web automation capabilities. - -## Best Practices - -1. **Handle Loading**: Wait for page content to load -2. **Error Handling**: Handle navigation and interaction failures -3. **Rate Limiting**: Be respectful of target websites -4. **Security**: Avoid sensitive operations in browser -5. **Timeouts**: Set appropriate timeouts for operations - -## See Also - -- **[browser-use](https://github.com/browser-use/browser-use)** - Underlying browser automation library -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For local command execution -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - For processing extracted content diff --git a/sdk/arch/tools/file_editor.mdx b/sdk/arch/tools/file_editor.mdx deleted file mode 100644 index fff65d25..00000000 --- a/sdk/arch/tools/file_editor.mdx +++ /dev/null @@ -1,338 +0,0 @@ ---- -title: FileEditorTool -description: Edit files with diff-based operations, undo support, and intelligent line-based modifications. ---- - -FileEditorTool provides powerful file editing capabilities with diff-based operations, undo/redo support, and intelligent line-based modifications. It's designed for precise code and text file manipulation. - -**Source**: [`openhands/tools/file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/file_editor) - -## Overview - -FileEditorTool provides: -- View file contents with line numbers -- Insert, delete, and replace lines -- String-based find-and-replace -- Undo/redo support -- Automatic diff generation -- File history tracking - -## Usage - -```python -from openhands.tools import FileEditorTool - -# Create tool -file_editor = FileEditorTool.create() - -# Use with agent -agent = Agent(llm=llm, tools=[file_editor]) -``` - -## Available Commands - -**Source**: [`openhands/tools/file_editor/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/definition.py) - -### view -View file contents with line numbers. - -```python -FileEditAction( - command="view", - path="script.py" -) -``` - -Optional parameters: -- `view_range=[start, end]`: View specific line range - -### create -Create a new file with content. - -```python -FileEditAction( - command="create", - path="newfile.py", - file_text="print('Hello, World!')\n" -) -``` - -### str_replace -Replace a string in the file. - -```python -FileEditAction( - command="str_replace", - path="script.py", - old_str="old_function()", - new_str="new_function()" -) -``` - -### insert -Insert text after a specific line. - -```python -FileEditAction( - command="insert", - path="script.py", - insert_line=10, - new_str=" # New code here\n" -) -``` - -### undo_edit -Undo the last edit operation. - -```python -FileEditAction( - command="undo_edit", - path="script.py" -) -``` - -## Action Model - -```python -class FileEditAction(Action): - command: Literal["view", "create", "str_replace", "insert", "undo_edit"] - path: str # File path - file_text: str | None = None # For create - old_str: str | None = None # For str_replace - new_str: str | None = None # For str_replace/insert - insert_line: int | None = None # For insert - view_range: list[int] | None = None # For view -``` - -## Observation Model - -```python -class FileEditObservation(Observation): - content: str # Result message or file content - success: bool # Operation success status - diff: str | None = None # Unified diff for changes -``` - -## Features - -### Diff Generation - -Automatic diff generation for all modifications: - -```python -# After edit -observation = FileEditObservation( - content="File edited successfully", - success=True, - diff=""" ---- script.py -+++ script.py -@@ -1,3 +1,3 @@ - def main(): -- print("old") -+ print("new") -""" -) -``` - -### Edit History - -Track file modification history with undo support: - -```python -# Edit file -action1 = FileEditAction(command="str_replace", path="file.py", ...) - -# Make another edit -action2 = FileEditAction(command="insert", path="file.py", ...) - -# Undo last edit -action3 = FileEditAction(command="undo_edit", path="file.py") -``` - -### Line-Based Operations - -All operations work with line numbers for precision: - -```python -# View specific lines -FileEditAction( - command="view", - path="large_file.py", - view_range=[100, 150] # View lines 100-150 -) - -# Insert at specific line -FileEditAction( - command="insert", - path="script.py", - insert_line=25, - new_str=" new_code()\n" -) -``` - -### String Replacement - -Find and replace with exact matching: - -```python -# Must match exactly including whitespace -FileEditAction( - command="str_replace", - path="config.py", - old_str="DEBUG = False\nLOG_LEVEL = 'INFO'", - new_str="DEBUG = True\nLOG_LEVEL = 'DEBUG'" -) -``` - -## Common Use Cases - -### Creating Files - -```python -# Create Python script -FileEditAction( - command="create", - path="hello.py", - file_text="#!/usr/bin/env python3\nprint('Hello, World!')\n" -) - -# Create configuration file -FileEditAction( - command="create", - path="config.json", - file_text='{"setting": "value"}\n' -) -``` - -### Viewing Files - -```python -# View entire file -FileEditAction(command="view", path="README.md") - -# View specific section -FileEditAction( - command="view", - path="large_file.py", - view_range=[1, 50] -) - -# View end of file -FileEditAction( - command="view", - path="log.txt", - view_range=[-20, -1] # Last 20 lines -) -``` - -### Refactoring Code - -```python -# Rename function -FileEditAction( - command="str_replace", - path="module.py", - old_str="def old_name(arg):", - new_str="def new_name(arg):" -) - -# Add import -FileEditAction( - command="insert", - path="script.py", - insert_line=0, - new_str="import numpy as np\n" -) - -# Fix bug -FileEditAction( - command="str_replace", - path="buggy.py", - old_str=" if x = 5:", - new_str=" if x == 5:" -) -``` - -## Best Practices - -1. **View Before Editing**: Always view file content first -2. **Exact String Matching**: Ensure `old_str` matches exactly -3. **Include Context**: Include surrounding lines for uniqueness -4. **Use Line Numbers**: View with line numbers for precise edits -5. **Check Success**: Verify `observation.success` before proceeding -6. **Review Diffs**: Check generated diffs for accuracy -7. **Use Undo Sparingly**: Undo only when necessary - -## Error Handling - -### Common Errors - -```python -# File not found -FileEditObservation( - content="Error: File 'missing.py' not found", - success=False -) - -# String not found -FileEditObservation( - content="Error: old_str not found in file", - success=False -) - -# Multiple matches -FileEditObservation( - content="Error: old_str matched multiple locations", - success=False -) - -# Invalid line number -FileEditObservation( - content="Error: insert_line out of range", - success=False -) -``` - -### Recovery Strategies - -```python -# If string not found, view file first -if not observation.success and "not found" in observation.content: - # View file to understand current content - view_action = FileEditAction(command="view", path=path) -``` - -## Implementation Details - -**Source**: [`openhands/tools/file_editor/impl.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/impl.py) - -The editor maintains: -- **File Cache**: Efficient file content caching -- **Edit History**: Per-file undo stack -- **Diff Engine**: Unified diff generation -- **Encoding Detection**: Automatic encoding handling - -## Configuration - -**Source**: [`openhands/tools/file_editor/utils/config.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/file_editor/utils/config.py) - -```python -# Constants -MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB -MAX_HISTORY_SIZE = 100 # Max undo operations -``` - -## Security Considerations - -- File operations are restricted to working directory -- No execution of file content -- Safe for user-generated content -- Automatic encoding detection prevents binary file issues - -## See Also - -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - For file system operations -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Usage examples diff --git a/sdk/arch/tools/glob.mdx b/sdk/arch/tools/glob.mdx deleted file mode 100644 index 8983d0af..00000000 --- a/sdk/arch/tools/glob.mdx +++ /dev/null @@ -1,89 +0,0 @@ ---- -title: GlobTool -description: Find files using glob patterns with recursive search and flexible matching. ---- - -GlobTool enables file discovery using glob patterns, supporting recursive search, wildcards, and flexible path matching. - -**Source**: [`openhands/tools/glob/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/glob) - -## Usage - -```python -from openhands.tools import GlobTool - -agent = Agent(llm=llm, tools=[GlobTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/glob/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/glob/definition.py) - -```python -class GlobAction(Action): - pattern: str # Glob pattern (e.g., "**/*.py") -``` - -## Observation Model - -```python -class GlobObservation(Observation): - paths: list[str] # List of matching file paths -``` - -## Pattern Syntax - -- `*`: Match any characters except `/` -- `**`: Match any characters including `/` (recursive) -- `?`: Match single character -- `[abc]`: Match any character in brackets -- `[!abc]`: Match any character not in brackets - -## Examples - -### Find Python Files - -```python -GlobAction(pattern="**/*.py") -# Returns: ["src/main.py", "tests/test_main.py", ...] -``` - -### Find Specific Files - -```python -GlobAction(pattern="**/test_*.py") -# Returns: ["tests/test_api.py", "tests/test_utils.py", ...] -``` - -### Multiple Extensions - -```python -GlobAction(pattern="**/*.{py,js,ts}") -# Returns: ["script.py", "app.js", "types.ts", ...] -``` - -### Current Directory Only - -```python -GlobAction(pattern="*.txt") -# Returns: ["readme.txt", "notes.txt", ...] -``` - -## Common Use Cases - -- **Code Discovery**: `**/*.py` - Find all Python files -- **Test Files**: `**/test_*.py` - Find test files -- **Configuration**: `**/*.{json,yaml,yml}` - Find config files -- **Documentation**: `**/*.md` - Find markdown files - -## Best Practices - -1. **Use Recursive Patterns**: `**/*` for deep searches -2. **Specific Extensions**: Narrow results with extensions -3. **Combine with GrepTool**: Find files, then search content -4. **Check Results**: Handle empty result lists - -## See Also - -- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative file operations diff --git a/sdk/arch/tools/grep.mdx b/sdk/arch/tools/grep.mdx deleted file mode 100644 index bd879318..00000000 --- a/sdk/arch/tools/grep.mdx +++ /dev/null @@ -1,140 +0,0 @@ ---- -title: GrepTool -description: Search file contents using regex patterns with context and match highlighting. ---- - -GrepTool enables content search across files using regex patterns, providing context around matches and detailed results. - -**Source**: [`openhands/tools/grep/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/grep) - -## Usage - -```python -from openhands.tools import GrepTool - -agent = Agent(llm=llm, tools=[GrepTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/grep/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/grep/definition.py) - -```python -class GrepAction(Action): - pattern: str # Regex pattern to search - path: str = "." # Directory or file to search - case_sensitive: bool = False # Case sensitivity -``` - -## Observation Model - -```python -class GrepObservation(Observation): - matches: list[dict] # List of matches with context - # Each match contains: - # - file: str - File path - # - line: int - Line number - # - content: str - Matching line -``` - -## Examples - -### Search for Function Definition - -```python -GrepAction( - pattern=r"def\s+\w+\(", - path="src/", - case_sensitive=False -) -# Returns: [ -# {"file": "src/main.py", "line": 10, "content": "def process_data(x):"}, -# ... -# ] -``` - -### Case-Sensitive Search - -```python -GrepAction( - pattern="TODO", - path=".", - case_sensitive=True -) -# Only matches exact case "TODO" -``` - -### Search Specific File - -```python -GrepAction( - pattern="import.*pandas", - path="script.py" -) -``` - -## Pattern Syntax - -Supports Python regex patterns: -- `.`: Any character -- `*`: Zero or more -- `+`: One or more -- `?`: Optional -- `[]`: Character class -- `()`: Group -- `|`: Alternation -- `^`: Line start -- `$`: Line end - -## Common Use Cases - -### Find TODOs - -```python -GrepAction(pattern=r"TODO|FIXME|XXX", path=".") -``` - -### Find Imports - -```python -GrepAction(pattern=r"^import |^from .* import ", path="src/") -``` - -### Find API Keys (for security review) - -```python -GrepAction(pattern=r"api[_-]key|secret|password", path=".") -``` - -### Find Function Calls - -```python -GrepAction(pattern=r"database\.query\(", path=".") -``` - -## Best Practices - -1. **Escape Special Characters**: Use `\` for regex special chars -2. **Use Anchors**: `^` and `$` for line boundaries -3. **Case Insensitive Default**: Unless exact case matters -4. **Narrow Search Paths**: Search specific directories -5. **Combine with GlobTool**: Find files first, then grep - -## Workflow Pattern - -```python -# 1. Find relevant files -glob_action = GlobAction(pattern="**/*.py") - -# 2. Search content in those files -grep_action = GrepAction( - pattern="class.*Exception", - path="src/" -) -``` - -## See Also - -- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - View/edit files -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Alternative with `grep` command diff --git a/sdk/arch/tools/overview.mdx b/sdk/arch/tools/overview.mdx deleted file mode 100644 index aadf3f01..00000000 --- a/sdk/arch/tools/overview.mdx +++ /dev/null @@ -1,185 +0,0 @@ ---- -title: Tools Overview -description: Pre-built tools for common agent operations including bash execution, file editing, and code search. ---- - -The `openhands.tools` package provides a collection of pre-built, production-ready tools for common agent operations. These tools enable agents to interact with files, execute commands, search code, and manage tasks. - -**Source**: [`openhands/tools/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools) - -## Available Tools - -### Core Tools - -- **[BashTool](/sdk/architecture/tools/bash.mdx)** - Execute bash commands with timeout and environment support -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Edit files with diff-based operations and undo support -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning workflows - -### Search Tools - -- **[GlobTool](/sdk/architecture/tools/glob.mdx)** - Find files using glob patterns -- **[GrepTool](/sdk/architecture/tools/grep.mdx)** - Search file contents with regex support - -### Specialized Tools - -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Track and manage agent tasks -- **[BrowserUseTool](/sdk/architecture/tools/browser_use.mdx)** - Web browsing and interaction - -## Quick Start - -### Using Individual Tools - -```python -from openhands.sdk import Agent, LLM -from openhands.tools import BashTool, FileEditorTool -from pydantic import SecretStr - -agent = Agent( - llm=LLM( - model="anthropic/claude-sonnet-4-20250514", - api_key=SecretStr("your-api-key") - ), - tools=[ - BashTool.create(), - FileEditorTool.create() - ] -) -``` - -### Using Tool Presets - -```python -from openhands.tools.preset import get_default_tools, get_planning_tools - -# Default toolset for general tasks -default_tools = get_default_tools() - -# Specialized toolset for planning workflows -planning_tools = get_planning_tools() - -agent = Agent(llm=llm, tools=default_tools) -``` - -## Tool Structure - -All tools follow a consistent structure: - -```mermaid -graph TD - Tool[Tool Definition] --> Action[Action Model] - Tool --> Observation[Observation Model] - Tool --> Executor[Executor Implementation] - - Action --> Params[Input Parameters] - Observation --> Result[Output Data] - Executor --> Execute[execute() method] - - style Tool fill:#e1f5fe - style Action fill:#fff3e0 - style Observation fill:#e8f5e8 - style Executor fill:#f3e5f5 -``` - -### Tool Components - -1. **Action**: Input model defining tool parameters -2. **Observation**: Output model containing execution results -3. **Executor**: Implementation that executes the tool logic - -## Tool Presets - -**Source**: [`openhands/tools/preset/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/preset) - -### Default Preset - -**Source**: [`openhands/tools/preset/default.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/default.py) - -General-purpose toolset for most tasks: - -```python -from openhands.tools.preset import get_default_tools - -tools = get_default_tools() -# Includes: BashTool, FileEditorTool, GlobTool, GrepTool -``` - -### Planning Preset - -**Source**: [`openhands/tools/preset/planning.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py) - -Optimized for planning and multi-file workflows: - -```python -from openhands.tools.preset import get_planning_tools - -tools = get_planning_tools() -# Includes: BashTool, PlanningFileEditorTool, GlobTool, GrepTool, TaskTrackerTool -``` - -## Creating Custom Tools - -See the [Tool Definition Guide](/sdk/architecture/sdk/tool.mdx) for creating custom tools. - -## Tool Security - -Tools support security risk assessment: - -```python -from openhands.sdk.security import LLMSecurityAnalyzer, ConfirmOnHighRisk - -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()], - security_analyzer=LLMSecurityAnalyzer(llm=llm), - confirmation_policy=ConfirmOnHighRisk() -) -``` - -See [Security Documentation](/sdk/architecture/sdk/security.mdx) for more details. - -## Tool Configuration - -### Working Directory - -Most tools operate relative to a working directory: - -```python -from openhands.tools import BashTool - -bash_tool = BashTool.create(working_dir="/project/path") -``` - -### Timeout Settings - -Configure execution timeouts: - -```python -from openhands.tools import BashTool - -bash_tool = BashTool.create(timeout=60.0) # 60 seconds -``` - -## Best Practices - -1. **Use Presets**: Start with tool presets for common workflows -2. **Configure Timeouts**: Set appropriate timeouts for tools -3. **Provide Context**: Use working directories effectively -4. **Enable Security**: Add security analysis for sensitive operations -5. **Filter Tools**: Use `filter_tools_regex` to limit available tools -6. **Test Locally**: Verify tools work in your environment - -## Tool Examples - -Each tool has comprehensive examples: - -- **[Bash Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Command execution -- **[File Editor Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - File manipulation -- **[Planning Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Planning workflows -- **[Task Tracker Examples](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py)** - Task management - -## See Also - -- **[Tool Definition](/sdk/architecture/sdk/tool.mdx)** - Creating custom tools -- **[Agent Configuration](/sdk/architecture/sdk/agent.mdx)** - Using tools with agents -- **[Security](/sdk/architecture/sdk/security.mdx)** - Tool security -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples)** - Complete examples diff --git a/sdk/arch/tools/planning_file_editor.mdx b/sdk/arch/tools/planning_file_editor.mdx deleted file mode 100644 index e176c93b..00000000 --- a/sdk/arch/tools/planning_file_editor.mdx +++ /dev/null @@ -1,128 +0,0 @@ ---- -title: PlanningFileEditorTool -description: Multi-file editing tool optimized for planning workflows with batch operations. ---- - -PlanningFileEditorTool extends FileEditorTool with multi-file editing capabilities optimized for planning agent workflows. - -**Source**: [`openhands/tools/planning_file_editor/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/planning_file_editor) - -## Overview - -PlanningFileEditorTool provides: -- All FileEditorTool capabilities -- Optimized for planning workflows -- Batch file operations -- Coordination with TaskTrackerTool - -## Usage - -```python -from openhands.tools import PlanningFileEditorTool - -agent = Agent(llm=llm, tools=[PlanningFileEditorTool.create()]) -``` - -## Relation to FileEditorTool - -PlanningFileEditorTool inherits all FileEditorTool commands: -- `view`: View file contents -- `create`: Create new files -- `str_replace`: Replace strings -- `insert`: Insert lines -- `undo_edit`: Undo changes - -See [FileEditorTool](/sdk/architecture/tools/file_editor.mdx) for detailed command documentation. - -## Planning Workflow Integration - -```mermaid -graph TD - Plan[Create Task Plan] --> TaskTracker[TaskTrackerTool] - TaskTracker --> Edit[Edit Files] - Edit --> PlanningEditor[PlanningFileEditorTool] - PlanningEditor --> UpdateTasks[Update Task Status] - UpdateTasks --> TaskTracker - - style Plan fill:#fff3e0 - style Edit fill:#e1f5fe - style UpdateTasks fill:#e8f5e8 -``` - -## Usage in Planning Workflows - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): - -```python -from openhands.tools.preset import get_planning_tools - -# Get planning toolset (includes PlanningFileEditorTool) -tools = get_planning_tools() - -agent = Agent(llm=llm, tools=tools) -``` - -## Multi-File Workflow Example - -```python -# 1. Plan tasks -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Create config file", status="todo"), - Task(title="Create main script", status="todo"), - Task(title="Create tests", status="todo") - ] -) - -# 2. Create files -PlanningFileEditAction( - command="create", - path="config.yaml", - file_text="settings:\n debug: true\n" -) - -PlanningFileEditAction( - command="create", - path="main.py", - file_text="import yaml\n\nif __name__ == '__main__':\n pass\n" -) - -# 3. Update task status -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Create config file", status="done"), - Task(title="Create main script", status="done"), - Task(title="Create tests", status="in_progress") - ] -) -``` - -## Best Practices - -1. **Use with TaskTrackerTool**: Coordinate file edits with task status -2. **Plan Before Editing**: Create task plan first -3. **Update Progress**: Mark tasks complete after edits -4. **Follow Workflow**: Plan → Edit → Update → Repeat -5. **Use Planning Preset**: Get all planning tools together - -## When to Use - -Use PlanningFileEditorTool when: -- Building complex multi-file projects -- Following structured planning workflows -- Coordinating with task tracking -- Need agent to manage implementation phases - -Use regular FileEditorTool for: -- Simple file editing tasks -- Single-file modifications -- Ad-hoc editing without planning - -## See Also - -- **[FileEditorTool](/sdk/architecture/tools/file_editor.mdx)** - Base file editing capabilities -- **[TaskTrackerTool](/sdk/architecture/tools/task_tracker.mdx)** - Task management -- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Complete planning toolset -- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Full workflow example diff --git a/sdk/arch/tools/task_tracker.mdx b/sdk/arch/tools/task_tracker.mdx deleted file mode 100644 index 73966ef4..00000000 --- a/sdk/arch/tools/task_tracker.mdx +++ /dev/null @@ -1,146 +0,0 @@ ---- -title: TaskTrackerTool -description: Track and manage agent tasks with status updates and structured task lists. ---- - -TaskTrackerTool enables agents to create, update, and manage task lists for complex multi-step workflows. - -**Source**: [`openhands/tools/task_tracker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/tools/task_tracker) - -## Usage - -```python -from openhands.tools import TaskTrackerTool - -agent = Agent(llm=llm, tools=[TaskTrackerTool.create()]) -``` - -## Action Model - -**Source**: [`openhands/tools/task_tracker/definition.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/task_tracker/definition.py) - -```python -class TaskTrackerAction(Action): - command: Literal["view", "plan"] - task_list: list[Task] | None = None # For plan command -``` - -### Task Model - -```python -class Task: - title: str # Task title - status: Literal["todo", "in_progress", "done"] # Task status - notes: str | None = None # Optional notes -``` - -## Observation Model - -```python -class TaskTrackerObservation(Observation): - task_list: list[Task] # Current task list - message: str # Status message -``` - -## Commands - -### view -View current task list. - -```python -TaskTrackerAction(command="view") -``` - -### plan -Create or update task list. - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Setup environment", status="done"), - Task(title="Write code", status="in_progress"), - Task(title="Run tests", status="todo") - ] -) -``` - -## Usage Patterns - -### Initialize Task List - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Analyze requirements", status="todo"), - Task(title="Design solution", status="todo"), - Task(title="Implement features", status="todo"), - Task(title="Write tests", status="todo"), - Task(title="Deploy", status="todo") - ] -) -``` - -### Update Progress - -```python -TaskTrackerAction( - command="plan", - task_list=[ - Task(title="Analyze requirements", status="done"), - Task(title="Design solution", status="in_progress"), - Task(title="Implement features", status="todo"), - Task(title="Write tests", status="todo"), - Task(title="Deploy", status="todo") - ] -) -``` - -### Check Current Status - -```python -TaskTrackerAction(command="view") -# Returns current task list with status -``` - -## Best Practices - -1. **Plan Early**: Create task list at workflow start -2. **Update Regularly**: Mark tasks as progress happens -3. **Use Notes**: Add details for complex tasks -4. **One Task Active**: Focus on one "in_progress" task -5. **Mark Complete**: Set "done" when finished - -## Task Status Workflow - -```mermaid -graph LR - TODO[todo] -->|Start work| PROGRESS[in_progress] - PROGRESS -->|Complete| DONE[done] - DONE -->|Reopen if needed| TODO - - style TODO fill:#fff3e0 - style PROGRESS fill:#e1f5fe - style DONE fill:#c8e6c9 -``` - -## Example: Planning Agent - -See [`examples/01_standalone_sdk/24_planning_agent_workflow.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py): - -```python -# Planning agent uses TaskTrackerTool for workflow management -from openhands.tools.preset import get_planning_tools - -agent = Agent( - llm=llm, - tools=get_planning_tools() # Includes TaskTrackerTool -) -``` - -## See Also - -- **[PlanningFileEditorTool](/sdk/architecture/tools/planning_file_editor.mdx)** - Multi-file editing for planning -- **[Planning Preset](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/tools/preset/planning.py)** - Planning toolset -- **[Planning Example](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py)** - Complete workflow diff --git a/sdk/arch/workspace/docker.mdx b/sdk/arch/workspace/docker.mdx deleted file mode 100644 index 4c26fd52..00000000 --- a/sdk/arch/workspace/docker.mdx +++ /dev/null @@ -1,330 +0,0 @@ ---- -title: DockerWorkspace -description: Execute agent operations in isolated Docker containers with automatic container lifecycle management. ---- - -DockerWorkspace provides isolated execution environments using Docker containers. It automatically manages container lifecycle, networking, and resource allocation. - -**Source**: [`openhands/workspace/docker/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/docker) - -## Overview - -DockerWorkspace provides: -- Automatic container creation and cleanup -- Network isolation and port management -- Custom or pre-built Docker images -- Environment variable forwarding -- File system mounting -- Resource limits and controls - -## Usage - -### Basic Usage - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - base_image="python:3.12" -) - -with workspace: - result = workspace.execute_command("python --version") - print(result.stdout) # Python 3.12.x -``` - -### With Pre-built Image - -```python -workspace = DockerWorkspace( - working_dir="/workspace", - server_image="ghcr.io/all-hands-ai/agent-server:latest" -) -``` - -## Configuration - -**Source**: [`openhands/workspace/docker/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/docker/workspace.py) - -### Core Parameters - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `working_dir` | `str` | `"/workspace"` | Working directory in container | -| `base_image` | `str \| None` | `None` | Base image to build agent server from | -| `server_image` | `str \| None` | `None` | Pre-built agent server image | -| `host_port` | `int \| None` | `None` | Host port to bind (auto-assigned if None) | -| `forward_env` | `list[str]` | `["DEBUG"]` | Environment variables to forward | -| `container_name` | `str \| None` | `None` | Container name (auto-generated if None) | -| `platform` | `str \| None` | `None` | Target platform (e.g., "linux/amd64") | - -### Using Base Image - -Build agent server on top of custom base image: - -```python -workspace = DockerWorkspace( - base_image="ubuntu:22.04", - working_dir="/workspace" -) -``` - -Agent server components are installed on top of the base image. - -### Using Pre-built Server Image - -Use pre-built agent server image: - -```python -workspace = DockerWorkspace( - server_image="ghcr.io/all-hands-ai/agent-server:latest", - working_dir="/workspace" -) -``` - -Faster startup, no build time required. - -## Lifecycle Management - -### Automatic Cleanup - -```python -with DockerWorkspace(base_image="python:3.12") as workspace: - # Container created - workspace.execute_command("pip install requests") - # Commands execute in container -# Container automatically stopped and removed -``` - -### Manual Management - -```python -workspace = DockerWorkspace(base_image="python:3.12") - -# Manually start (happens automatically in context manager) -# Use workspace -result = workspace.execute_command("ls") - -# Manually cleanup -workspace.__exit__(None, None, None) -``` - -## Environment Configuration - -### Forward Environment Variables - -```python -import os - -os.environ["DATABASE_URL"] = "postgres://..." -os.environ["API_KEY"] = "secret" - -workspace = DockerWorkspace( - base_image="python:3.12", - forward_env=["DATABASE_URL", "API_KEY", "DEBUG"] -) - -with workspace: - result = workspace.execute_command("echo $DATABASE_URL") - # Outputs: postgres://... -``` - -### Custom Container Name - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - container_name="my-agent-container" -) -``` - -Useful for debugging and monitoring. - -### Platform Specification - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - platform="linux/amd64" # Force specific platform -) -``` - -Useful for Apple Silicon Macs running amd64 images. - -## Port Management - -DockerWorkspace automatically finds available ports for container communication: - -```python -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=None # Auto-assign (default) -) - -# Or specify explicit port -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=8000 # Use specific port -) -``` - -## File Operations - -### File Upload - -```python -workspace.file_upload( - source_path="local_file.txt", - destination_path="/workspace/file.txt" -) -``` - -### File Download - -```python -workspace.file_download( - source_path="/workspace/output.txt", - destination_path="local_output.txt" -) -``` - -## Building Docker Images - -DockerWorkspace can build custom agent server images: - -**Source**: [`openhands/agent_server/docker/build.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/agent_server/docker/build.py) - -```python -from openhands.agent_server.docker.build import ( - BuildOptions, - build -) - -# Build custom image -image_name = build( - BuildOptions( - base_image="ubuntu:22.04", - target="runtime", # or "dev" - platform="linux/amd64", - context_dir="." - ) -) - -# Use built image -workspace = DockerWorkspace(server_image=image_name) -``` - -## Use with Conversation - -```python -from openhands.sdk import Agent, Conversation -from openhands.tools import BashTool, FileEditorTool -from openhands.workspace import DockerWorkspace - -# Create workspace -workspace = DockerWorkspace( - base_image="python:3.12", - working_dir="/workspace" -) - -# Create agent -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Use in conversation -with workspace: - conversation = Conversation(agent=agent, workspace=workspace) - conversation.send_message("Create a Python web scraper") - conversation.run() -``` - -See [`examples/02_remote_agent_server/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server) for complete examples. - -## Security Benefits - -### Isolation - -- **Process Isolation**: Container runs separately from host -- **File System Isolation**: Limited access to host file system -- **Network Isolation**: Separate network namespace - -### Resource Limits - -```python -# Resource limits are configurable via Docker -# Set through Docker API or Dockerfile -``` - -### Sandboxing - -DockerWorkspace provides strong sandboxing: -- Agent cannot access host file system -- Agent cannot interfere with host processes -- Agent operates in controlled environment - -## Performance Considerations - -### Container Startup Time - -- **Base Image Build**: 30-60 seconds (first time) -- **Pre-built Image**: 5-10 seconds -- **Subsequent Runs**: Uses cached images - -### Optimization Tips - -1. **Use Pre-built Images**: Faster than building from base image -2. **Cache Base Images**: Docker caches layers -3. **Minimize Image Size**: Smaller images start faster -4. **Reuse Containers**: For multiple operations (advanced) - -## Troubleshooting - -### Container Fails to Start - -```python -# Check Docker is running -docker ps - -# Check logs -docker logs - -# Verify image exists -docker images -``` - -### Port Already in Use - -```python -# Specify different port -workspace = DockerWorkspace( - base_image="python:3.12", - host_port=8001 # Use alternative port -) -``` - -### Permission Issues - -```python -# Ensure Docker has necessary permissions -# On Linux, add user to docker group: -# sudo usermod -aG docker $USER -``` - -## Best Practices - -1. **Use Context Managers**: Always use `with` statement -2. **Pre-build Images**: Build agent server images ahead of time -3. **Set Resource Limits**: Configure appropriate limits -4. **Monitor Containers**: Track resource usage -5. **Clean Up**: Ensure containers are removed after use -6. **Use Specific Tags**: Pin image versions for reproducibility - -## See Also - -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - API-based remote execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server running in container -- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Docker workspace examples diff --git a/sdk/arch/workspace/overview.mdx b/sdk/arch/workspace/overview.mdx deleted file mode 100644 index 6a539776..00000000 --- a/sdk/arch/workspace/overview.mdx +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: Workspace Package Overview -description: Advanced workspace implementations providing sandboxed and remote execution environments. ---- - -The `openhands.workspace` package provides advanced workspace implementations for production deployments, including Docker-based sandboxing and remote API execution. - -**Source**: [`openhands/workspace/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace) - -## Available Workspaces - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker container isolation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote server execution - -## Workspace Hierarchy - -```mermaid -graph TD - Base[BaseWorkspace] --> Local[LocalWorkspace] - Base --> Remote[RemoteWorkspace] - Remote --> Docker[DockerWorkspace] - Remote --> API[RemoteAPIWorkspace] - - style Base fill:#e1f5fe - style Local fill:#e8f5e8 - style Remote fill:#fff3e0 - style Docker fill:#f3e5f5 - style API fill:#f3e5f5 -``` - -- **BaseWorkspace**: Core interface (in SDK) -- **LocalWorkspace**: Direct local execution (in SDK) -- **RemoteWorkspace**: Base for remote implementations -- **DockerWorkspace**: Docker container execution -- **RemoteAPIWorkspace**: API-based remote execution - -## Comparison - -| Feature | LocalWorkspace | DockerWorkspace | RemoteAPIWorkspace | -|---------|---------------|-----------------|-------------------| -| **Isolation** | None | Strong | Strong | -| **Performance** | Fast | Good | Network latency | -| **Setup** | None | Docker required | Server required | -| **Security** | Host system | Sandboxed | Sandboxed | -| **Use Case** | Development | Production/Testing | Distributed systems | - -## Quick Start - -### Docker Workspace - -```python -from openhands.workspace import DockerWorkspace - -workspace = DockerWorkspace( - working_dir="/workspace", - image="ubuntu:22.04" -) - -with workspace: - result = workspace.execute_command("echo 'Hello from Docker'") - print(result.stdout) -``` - -### Remote API Workspace - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("python script.py") - print(result.stdout) -``` - -## Use Cases - -### Development -Use `LocalWorkspace` for local development and testing. - -### Testing -Use `DockerWorkspace` for isolated test environments. - -### Production -Use `DockerWorkspace` or `RemoteAPIWorkspace` for production deployments. - -### Multi-User Systems -Use `RemoteAPIWorkspace` with centralized agent server. - -## See Also - -- **[SDK Workspace Interface](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Docker implementation -- **[RemoteAPIWorkspace](/sdk/architecture/workspace/remote_api.mdx)** - Remote API implementation -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server for remote workspaces diff --git a/sdk/arch/workspace/remote_api.mdx b/sdk/arch/workspace/remote_api.mdx deleted file mode 100644 index cb8ca8a4..00000000 --- a/sdk/arch/workspace/remote_api.mdx +++ /dev/null @@ -1,325 +0,0 @@ ---- -title: RemoteAPIWorkspace -description: Connect to centralized agent servers via HTTP API for scalable distributed agent execution. ---- - -RemoteAPIWorkspace enables agent execution on remote servers through HTTP APIs. It's designed for production deployments requiring centralized agent management and multi-user support. - -**Source**: [`openhands/workspace/remote_api/`](https://github.com/All-Hands-AI/agent-sdk/tree/main/openhands/workspace/remote_api) - -## Overview - -RemoteAPIWorkspace provides: -- HTTP API communication with agent server -- Authentication and authorization -- Centralized resource management -- Multi-user agent execution -- Monitoring and logging - -## Usage - -### Basic Usage - -```python -from openhands.workspace import RemoteAPIWorkspace - -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -with workspace: - result = workspace.execute_command("python script.py") - print(result.stdout) -``` - -### With Agent - -```python -from openhands.sdk import Agent, Conversation -from openhands.tools import BashTool, FileEditorTool - -# Create workspace -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://agent-server.example.com", - api_key="your-api-key" -) - -# Create agent -agent = Agent( - llm=llm, - tools=[BashTool.create(), FileEditorTool.create()] -) - -# Use in conversation -conversation = Conversation(agent=agent, workspace=workspace) -conversation.send_message("Your task") -conversation.run() -``` - -## Configuration - -**Source**: [`openhands/workspace/remote_api/workspace.py`](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/workspace/remote_api/workspace.py) - -### Parameters - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `working_dir` | `str` | Yes | Working directory on server | -| `api_url` | `str` | Yes | Agent server API URL | -| `api_key` | `str` | Yes | Authentication API key | -| `timeout` | `float` | No | Request timeout (default: 30) | - -### Example Configuration - -```python -workspace = RemoteAPIWorkspace( - working_dir="/workspace/user123", - api_url="https://agents.company.com", - api_key="sk-abc123...", - timeout=60.0 # 60 second timeout -) -``` - -## API Communication - -### HTTP Endpoints - -RemoteAPIWorkspace communicates with agent server endpoints: - -- `POST /api/workspace/command` - Execute commands -- `POST /api/workspace/upload` - Upload files -- `GET /api/workspace/download` - Download files -- `GET /api/health` - Health check - -### Authentication - -```python -# API key passed in Authorization header -headers = { - "Authorization": f"Bearer {api_key}" -} -``` - -### Error Handling - -```python -try: - result = workspace.execute_command("command") -except ConnectionError: - print("Failed to connect to agent server") -except TimeoutError: - print("Request timed out") -except Exception as e: - print(f"Execution error: {e}") -``` - -## File Operations - -### Upload Files - -```python -workspace.file_upload( - source_path="local_data.csv", - destination_path="/workspace/data.csv" -) -``` - -### Download Files - -```python -workspace.file_download( - source_path="/workspace/results.json", - destination_path="local_results.json" -) -``` - -### Large File Transfer - -```python -# Chunked upload for large files -workspace.file_upload( - source_path="large_dataset.zip", - destination_path="/workspace/dataset.zip" -) -``` - -## Architecture - -```mermaid -graph LR - Client[Client SDK] -->|HTTPS| API[Agent Server API] - API --> Container1[Container 1] - API --> Container2[Container 2] - API --> Container3[Container 3] - - Container1 --> Agent1[Agent] - Container2 --> Agent2[Agent] - Container3 --> Agent3[Agent] - - style Client fill:#e1f5fe - style API fill:#fff3e0 - style Container1 fill:#e8f5e8 - style Container2 fill:#e8f5e8 - style Container3 fill:#e8f5e8 -``` - -## Use Cases - -### Multi-User Platform - -```python -# Each user gets isolated workspace -user_workspace = RemoteAPIWorkspace( - working_dir=f"/workspace/{user_id}", - api_url="https://agents.platform.com", - api_key=user_api_key -) -``` - -### Scalable Agent Execution - -```python -# Server manages resource allocation -# Multiple agents run concurrently -# Automatic load balancing -``` - -### Centralized Monitoring - -```python -# Server tracks: -# - Resource usage per user -# - Agent execution logs -# - API usage metrics -# - Error rates and debugging info -``` - -## Security - -### Authentication - -- API key-based authentication -- Per-user access control -- Token expiration and rotation - -### Isolation - -- Separate workspaces per user -- Container-based sandboxing -- Network isolation - -### Data Protection - -- HTTPS communication -- Encrypted data transfer -- Secure file storage - -## Performance Considerations - -### Network Latency - -```python -# Latency depends on: -# - Network connection -# - Geographic distance -# - Server load - -# Optimization: -# - Use regional servers -# - Batch operations -# - Cache frequently accessed data -``` - -### Concurrent Execution - -```python -# Server handles concurrent requests -# Multiple users can run agents simultaneously -# Automatic resource management -``` - -## Deployment - -### Running Agent Server - -See [Agent Server Documentation](/sdk/architecture/agent_server/overview.mdx) for server setup: - -```bash -# Start agent server -docker run -d \ - -p 8000:8000 \ - -e API_KEY=your-secret-key \ - ghcr.io/all-hands-ai/agent-server:latest -``` - -### Using Deployed Server - -```python -# Client connects to deployed server -workspace = RemoteAPIWorkspace( - working_dir="/workspace", - api_url="https://your-server.com", - api_key="your-secret-key" -) -``` - -## Comparison with DockerWorkspace - -| Feature | DockerWorkspace | RemoteAPIWorkspace | -|---------|-----------------|-------------------| -| **Setup** | Local Docker | Remote server | -| **Network** | Local | Internet required | -| **Scaling** | Single machine | Multiple users | -| **Management** | Client-side | Server-side | -| **Latency** | Low | Network dependent | -| **Use Case** | Local dev/test | Production | - -## Best Practices - -1. **Use HTTPS**: Always use secure connections -2. **Rotate API Keys**: Regularly update authentication -3. **Set Timeouts**: Configure appropriate timeouts -4. **Handle Network Errors**: Implement retry logic -5. **Monitor Usage**: Track API calls and resource usage -6. **Regional Deployment**: Use nearby servers for lower latency -7. **Batch Operations**: Combine multiple operations when possible - -## Troubleshooting - -### Connection Failures - -```python -# Verify server is reachable -import requests -response = requests.get(f"{api_url}/api/health") -print(response.status_code) # Should be 200 -``` - -### Authentication Errors - -```python -# Verify API key is correct -# Check key has not expired -# Ensure proper authorization headers -``` - -### Timeout Issues - -```python -# Increase timeout for long operations -workspace = RemoteAPIWorkspace( - api_url=api_url, - api_key=api_key, - timeout=120.0 # 2 minutes -) -``` - -## See Also - -- **[DockerWorkspace](/sdk/architecture/workspace/docker.mdx)** - Local Docker execution -- **[Agent Server](/sdk/architecture/agent_server/overview.mdx)** - Server implementation -- **[SDK Workspace](/sdk/architecture/sdk/workspace.mdx)** - Base workspace interface -- **[Examples](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/02_remote_agent_server)** - Remote workspace examples From dc53f3131d351cef518c50ebc2e36d729f23d527 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:43:27 -0400 Subject: [PATCH 37/58] allow sync code block to work with abitrary file --- .github/scripts/sync_code_blocks.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/scripts/sync_code_blocks.py b/.github/scripts/sync_code_blocks.py index 2ba2e759..93befc54 100755 --- a/.github/scripts/sync_code_blocks.py +++ b/.github/scripts/sync_code_blocks.py @@ -37,8 +37,8 @@ def extract_code_blocks(content: str) -> list[tuple[str, str, int, int]]: ``` """ - # Captures examples/...*.py after the first line, then the body up to ``` - pattern = r'```python[^\n]*\s+(examples/[^\s]+\.py)\n(.*?)```' + # Captures ...*.py after the first line, then the body up to ``` + pattern = r'```python[^\n]*\s+([^\s]+\.py)\n(.*?)```' matches: list[tuple[str, str, int, int]] = [] for match in re.finditer(pattern, content, re.DOTALL): file_ref = match.group(1) From 7616ccfe10d87583674811a04e9d5d20fdefa10f Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 17:54:00 -0400 Subject: [PATCH 38/58] fix --- docs.json | 13 +- .../github-workflows/todo-management.mdx | 564 ++++++++++++++++++ 2 files changed, 572 insertions(+), 5 deletions(-) create mode 100644 sdk/guides/github-workflows/todo-management.mdx diff --git a/docs.json b/docs.json index 9b3f803b..00574e7a 100644 --- a/docs.json +++ b/docs.json @@ -242,11 +242,14 @@ { "group": "Architecture", "pages": [ - "sdk/arch/overview", - "sdk/arch/sdk-package", - "sdk/arch/tools-package", - "sdk/arch/workspace-package", - "sdk/arch/agent-server-package" + { + "group": "Language Models", + "pages": [ + "sdk/arch/llms/index", + "sdk/arch/llms/configuration", + "sdk/arch/llms/providers" + ] + } ] } ] diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx new file mode 100644 index 00000000..b22fb354 --- /dev/null +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -0,0 +1,564 @@ +--- +title: TODO Management Workflow +description: Automate TODO implementation with AI-powered code changes using GitHub Actions. +--- + + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + + +Automatically scan your codebase for TODO comments and let the AI agent implement them, creating pull requests with the changes. This showcases practical automation and self-improving codebase capabilities. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Configure GitHub Actions permissions +# Settings → Actions → General → Workflow permissions +# Enable: "Read and write permissions" + "Allow GitHub Actions to create and approve pull requests" + +# 4. Add TODO comments to your code +# Example: # TODO(openhands): Add input validation for user email +``` + +## Features + +- **Smart Scanning** - Finds legitimate TODO comments with configurable identifiers +- **AI Implementation** - Uses OpenHands agent to automatically implement TODOs +- **PR Management** - Creates feature branches and pull requests automatically +- **Progress Tracking** - Tracks TODO processing status and PR creation +- **Comprehensive Reporting** - Detailed GitHub Actions summary with processing status +- **Configurable** - Customizable TODO identifiers and processing limits + +## How It Works + +### 1. Scan Phase + +The workflow scans your codebase for configurable TODO comments: + +```python icon="python" expandable examples/03_github_workflows/03_todo_management/scanner.py +#!/usr/bin/env python3 +""" +TODO Scanner for OpenHands Automated TODO Management + +Scans for configurable TODO comments in Python, TypeScript, Java, and Rust files. +Default identifier: TODO(openhands) +""" + +import argparse +import json +import logging +import os +import re +import sys +from pathlib import Path + + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + handlers=[ + # Log to stderr to avoid JSON interference + logging.StreamHandler(sys.stderr), + ], +) +logger = logging.getLogger(__name__) + + +def scan_file_for_todos( + file_path: Path, todo_identifier: str = "TODO(openhands)" +) -> list[dict]: + """Scan a single file for configurable TODO comments.""" + # Only scan specific file extensions + if file_path.suffix.lower() not in {".py", ".ts", ".java", ".rs"}: + logger.debug(f"Skipping file {file_path} (unsupported extension)") + return [] + + # Skip test files and example files that contain mock TODOs + file_str = str(file_path) + if ( + "/test" in file_str + or "/tests/" in file_str + or "test_" in file_path.name + # Skip examples + or "examples/03_github_workflows/03_todo_management/" in file_str + ): + logger.debug(f"Skipping test/example file: {file_path}") + return [] + + logger.debug(f"Scanning file: {file_path}") + + try: + with open(file_path, encoding="utf-8", errors="ignore") as f: + lines = f.readlines() + except (OSError, UnicodeDecodeError) as e: + logger.warning(f"Failed to read file {file_path}: {e}") + return [] + + todos = [] + # Escape special regex characters in the identifier + escaped_identifier = re.escape(todo_identifier) + todo_pattern = re.compile(rf"{escaped_identifier}(?::\s*(.*))?", re.IGNORECASE) + + for line_num, line in enumerate(lines, 1): + match = todo_pattern.search(line) + if match: + # Extract initial description from the TODO line + description = match.group(1).strip() if match.group(1) else "" + + # Look ahead for continuation lines that are also comments + continuation_lines = [] + for next_line_idx in range(line_num, len(lines)): + next_line = lines[next_line_idx] + next_stripped = next_line.strip() + + # Check if this line is a comment continuation + if ( + next_stripped.startswith("#") + and not next_stripped.startswith(f"# {todo_identifier}") + # Skip empty comment lines + and next_stripped != "#" + # Must have content after # + and len(next_stripped) > 1 + ): + # Extract comment content (remove # and leading whitespace) + comment_content = next_stripped[1:].strip() + + if comment_content: # Only add non-empty content + continuation_lines.append(comment_content) + elif next_stripped == "#": + # Empty comment line - continue looking + continue + else: + # Stop at first non-comment line + break + + # Combine description with continuation lines + if continuation_lines: + if description: + full_description = description + " " + " ".join(continuation_lines) + else: + full_description = " ".join(continuation_lines) + else: + full_description = description + + todo_item = { + "file": str(file_path), + "line": line_num, + "description": full_description, + } + todos.append(todo_item) + logger.info(f"Found TODO in {file_path}:{line_num}: {full_description}") + + if todos: + logger.info(f"Found {len(todos)} TODO(s) in {file_path}") + return todos + + +def scan_directory( + directory: Path, todo_identifier: str = "TODO(openhands)" +) -> list[dict]: + """Recursively scan a directory for configurable TODO comments.""" + logger.info(f"Scanning directory: {directory}") + all_todos = [] + + for root, dirs, files in os.walk(directory): + # Skip hidden and common ignore directories + dirs[:] = [ + d + for d in dirs + if not d.startswith(".") + and d + not in { + "__pycache__", + "node_modules", + ".venv", + "venv", + "build", + "dist", + } + ] + + for file in files: + file_path = Path(root) / file + todos = scan_file_for_todos(file_path, todo_identifier) + all_todos.extend(todos) + + return all_todos + + +def main(): + """Main function to scan for TODOs and output results.""" + parser = argparse.ArgumentParser( + description="Scan codebase for configurable TODO comments" + ) + parser.add_argument( + "directory", + nargs="?", + default=".", + help="Directory to scan (default: current directory)", + ) + parser.add_argument("--output", "-o", help="Output file (default: stdout)") + parser.add_argument( + "--identifier", + "-i", + default="TODO(openhands)", + help="TODO identifier to search for (default: TODO(openhands))", + ) + + args = parser.parse_args() + + path = Path(args.directory) + if not path.exists(): + logger.error(f"Path '{path}' does not exist") + return 1 + + if path.is_file(): + logger.info(f"Starting TODO scan on file: {path}") + todos = scan_file_for_todos(path, args.identifier) + else: + logger.info(f"Starting TODO scan in directory: {path}") + todos = scan_directory(path, args.identifier) + logger.info(f"Scan complete. Found {len(todos)} total TODO(s)") + output = json.dumps(todos, indent=2) + + if args.output: + with open(args.output, "w", encoding="utf-8") as f: + f.write(output) + print(f"Found {len(todos)} TODO(s), written to {args.output}") + else: + print(output) + + return 0 + + +if __name__ == "__main__": + exit(main()) +``` + +The scanner: +- Default identifier: `TODO(openhands)` (customizable via workflow input) +- Filters out false positives (documentation, test files, quoted strings) +- Supports Python, TypeScript, Java, and Rust files (`.py`, `.ts`, `.java`, `.rs`) +- Provides detailed logging of found TODOs + +Key functions: + +```python +def scan_file_for_todos(file_path: Path, todo_identifier: str = "TODO(openhands)") -> list[dict]: + """Scan a single file for configurable TODO comments.""" +``` + +### 2. Process Phase + +For each TODO found, the agent script implements it: + +```python icon="python" expandable examples/03_github_workflows/03_todo_management/agent_script.py +#!/usr/bin/env python3 +""" +TODO Agent for OpenHands Automated TODO Management + +This script processes individual TODO(openhands) comments using OpenHands agent +to implement the TODO. Designed for use with GitHub Actions workflows. + +Usage: + python agent_script.py + +Arguments: + todo_json: JSON string containing TODO information from scanner.py + +Environment Variables: + LLM_API_KEY: API key for the LLM (required) + LLM_MODEL: Language model to use (default: openhands/claude-sonnet-4-5-20250929) + LLM_BASE_URL: Optional base URL for LLM API + GITHUB_TOKEN: GitHub token for creating PRs (required) + GITHUB_REPOSITORY: Repository in format owner/repo (required) + +For setup instructions and usage examples, see README.md in this directory. +""" + +import argparse +import json +import os +import sys + +from prompt import PROMPT + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def process_todo(todo_data: dict): + """Process a single TODO item using OpenHands agent.""" + file_path = todo_data["file"] + line_num = todo_data["line"] + description = todo_data["description"] + + logger.info(f"Processing TODO in {file_path}:{line_num}") + + # Configure LLM + api_key = os.getenv("LLM_API_KEY") + if not api_key: + logger.error("LLM_API_KEY environment variable is not set.") + sys.exit(1) + + model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") + + llm_config = { + "model": model, + "api_key": api_key, + "service_id": "agent_script", + "drop_params": True, + } + + if base_url: + llm_config["base_url"] = base_url + + llm = LLM(**llm_config) + + # Create the prompt + prompt = PROMPT.format( + file_path=file_path, + line_num=line_num, + description=description, + ) + + # Get the current working directory as workspace + cwd = os.getcwd() + + # Create agent with default tools + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Create conversation + conversation = Conversation( + agent=agent, + workspace=cwd, + ) + + logger.info("Starting task execution...") + logger.info(f"Prompt: {prompt[:200]}...") + + # Send the prompt and run the agent + conversation.send_message(prompt) + conversation.run() + + logger.info("Task completed successfully") + + +def main(): + """Main function to process a TODO item.""" + parser = argparse.ArgumentParser( + description="Process a TODO(openhands) comment using OpenHands agent" + ) + parser.add_argument("todo_json", help="JSON string containing TODO information") + + args = parser.parse_args() + + try: + todo_data = json.loads(args.todo_json) + except json.JSONDecodeError as e: + logger.error(f"Invalid JSON input: {e}") + sys.exit(1) + + # Validate required fields + required_fields = ["file", "line", "description"] + for field in required_fields: + if field not in todo_data: + logger.error(f"Missing required field in TODO data: {field}") + sys.exit(1) + + # Process the TODO + process_todo(todo_data) + + +if __name__ == "__main__": + main() +``` + +The agent: +1. Configures the LLM using environment variables +2. Creates a prompt with the TODO details +3. Uses the default agent with standard tools +4. Runs the conversation to implement the TODO + +Key configuration: + +```python +# Configure LLM +llm = LLM( + model=model, + api_key=api_key, + service_id="agent_script", + drop_params=True, +) + +# Create agent with default tools +agent = get_default_agent(llm=llm, cli_mode=True) + +# Create conversation +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message(prompt) +conversation.run() +``` + +### 3. Prompt Template + +The prompt guides the agent to implement the TODO and create a PR: + +```python icon="python" expandable examples/03_github_workflows/03_todo_management/prompt.py +"""Prompt template for TODO implementation.""" + +PROMPT = """Please implement a TODO comment in a codebase. + +IMPORTANT - Creating a Pull Request: +- Use the `gh pr create` command to create the PR +- The GITHUB_TOKEN environment variable is available for authentication +- PR Title: "[Openhands] {description}" +- Branch name "openhands/todo/***" + +Your task is to: +1. Analyze the TODO comment and understand what needs to be implemented +2. Search in github for any existing PRs that adress this TODO + Filter by title [Openhands]... Don't implement anything if such a PR exists +2. Create a feature branch for this implementation +3. Implement what is asked by the TODO +4. Create a pull request with your changes +5. Add 2 reviewers + * Tag the person who wrote the TODO as a reviewer + * read the git blame information for the files, and find the most recent and + active contributors to the file/location of the changes. + Assign one of these people as a reviewer. + + +Please make sure to: +- Create a descriptive branch name related to the TODO +- Fix the issue with clean code +- Include a test if needed, but not always necessary + +TODO Details: +- File: {file_path} +- Line: {line_num} +- Description: {description} +""" +``` + +The prompt instructs the agent to: +- Analyze the TODO comment +- Check for existing PRs addressing this TODO +- Create a feature branch (`openhands/todo/***`) +- Implement the requested changes +- Create a pull request with title `[Openhands] {description}` +- Add reviewers (TODO author and active contributors) + +## Usage + +### Manual Trigger + +1. Go to **Actions** → "Automated TODO Management" +2. Click **Run workflow** +3. (Optional) Configure parameters: + - **Max TODOs**: Maximum number of TODOs to process (default: 3) + - **TODO Identifier**: Custom identifier to search for (default: `TODO(openhands)`) +4. Click **Run workflow** + +### Adding TODO Comments + +Add TODO comments in the following format anywhere in your codebase: + +```python +# TODO(openhands): Add input validation for user email +def process_user_email(email): + return email.lower() + +# TODO(openhands): Implement caching mechanism for API responses +def fetch_api_data(endpoint): + # Current implementation without caching + return requests.get(endpoint).json() +``` + +**Supported Comment Styles:** +- `# TODO(openhands): description` (Python, Shell, etc.) +- `// TODO(openhands): description` (TypeScript, Java, Rust, etc.) + +**Custom Identifiers:** +You can use custom TODO identifiers like `TODO(myteam)`, `TODO[urgent]`, etc. Configure this in the workflow parameters. + +### Scanner CLI Usage + +You can also run the scanner directly from the command line: + +```bash +# Scan current directory with default identifier +python scanner.py . + +# Scan with custom identifier +python scanner.py . --identifier "TODO(myteam)" + +# Scan specific directory and save to file +python scanner.py /path/to/code --output todos.json + +# Get help +python scanner.py --help +``` + +## Configuration + +Edit `.github/workflows/todo-management.yml` to customize: + +```yaml +env: + LLM_MODEL: openhands/claude-sonnet-4-5-20250929 + # LLM_BASE_URL: 'https://custom-api.example.com' # Optional + MAX_TODOS: 3 # Maximum number of TODOs to process + TODO_IDENTIFIER: "TODO(openhands)" # Customize the identifier +``` + +## Output Summary + +The workflow generates a comprehensive summary showing: +- All processed TODOs with their file locations +- Associated pull request URLs for successful implementations +- Processing status (success, partial, failed) for each TODO + +Example output: +``` +āœ… Processed 3 TODOs + +šŸ“ TODO #1: Add input validation for user email + File: src/utils.py:45 + PR: https://github.com/user/repo/pull/123 + Status: āœ… Success + +šŸ“ TODO #2: Implement caching mechanism + File: src/api.py:78 + PR: https://github.com/user/repo/pull/124 + Status: āœ… Success +``` + +## Best Practices + +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments for better AI understanding +- **Review PRs** - Always review the generated PRs before merging +- **Custom Identifiers** - Use team-specific identifiers like `TODO(backend)` for different teams +- **Scheduled Runs** - Set up cron schedules for regular TODO processing + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) From 2ccaabf8a43ebbae7dedc2e3a41714f3e5e08390 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Wed, 22 Oct 2025 18:00:01 -0400 Subject: [PATCH 39/58] update workflow --- docs.json | 3 +++ sdk/guides/github-workflows/pr-review.mdx | 6 +++++- sdk/guides/github-workflows/routine-maintenance.mdx | 2 +- sdk/guides/github-workflows/todo-management.mdx | 2 +- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/docs.json b/docs.json index 00574e7a..3f392f8e 100644 --- a/docs.json +++ b/docs.json @@ -235,6 +235,9 @@ { "group": "GitHub Workflows", "pages": [ + "sdk/guides/github-workflows/pr-review", + "sdk/guides/github-workflows/routine-maintenance", + "sdk/guides/github-workflows/todo-management" ] } ] diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index 41977f29..a36aa8b2 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -1,5 +1,5 @@ --- -title: PR Review Workflow +title: PR Review description: Automate pull request reviews with AI-powered code analysis using GitHub Actions. --- @@ -9,6 +9,10 @@ This example is available on GitHub: [examples/github_workflows/02_pr_review/](h Automatically review pull requests when labeled, providing comprehensive feedback on code quality, security, and best practices. +```yaml icon="yaml" expandable agent-sdk/examples/03_github_workflows/01_basic_action/workflow.yml + +``` + ## Quick Start ```bash diff --git a/sdk/guides/github-workflows/routine-maintenance.mdx b/sdk/guides/github-workflows/routine-maintenance.mdx index 86b42168..55336226 100644 --- a/sdk/guides/github-workflows/routine-maintenance.mdx +++ b/sdk/guides/github-workflows/routine-maintenance.mdx @@ -1,5 +1,5 @@ --- -title: Routine Maintenance Workflow +title: Routine Maintenance description: Automate routine maintenance tasks with GitHub Actions and OpenHands agents. --- diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx index b22fb354..37067c80 100644 --- a/sdk/guides/github-workflows/todo-management.mdx +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -1,5 +1,5 @@ --- -title: TODO Management Workflow +title: TODO Management description: Automate TODO implementation with AI-powered code changes using GitHub Actions. --- From 972d3cd3d496ec97f0f47440d73fa01bd2beebf0 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 20:44:53 +0200 Subject: [PATCH 40/58] Revise TODO Management guide for OpenHands Agent Updated the description and features of the TODO Management guide, emphasizing the use of OpenHands Agent for implementation and improved PR management. --- .../github-workflows/todo-management.mdx | 532 +----------------- 1 file changed, 8 insertions(+), 524 deletions(-) diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx index 37067c80..895f0db2 100644 --- a/sdk/guides/github-workflows/todo-management.mdx +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -1,13 +1,13 @@ --- title: TODO Management -description: Automate TODO implementation with AI-powered code changes using GitHub Actions. +description: Implement TODOs using OpenHands Agent --- This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) -Automatically scan your codebase for TODO comments and let the AI agent implement them, creating pull requests with the changes. This showcases practical automation and self-improving codebase capabilities. +Automatically scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on who contributed to the code near the TODO. ## Quick Start @@ -26,535 +26,19 @@ cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflow # Example: # TODO(openhands): Add input validation for user email ``` -## Features - -- **Smart Scanning** - Finds legitimate TODO comments with configurable identifiers -- **AI Implementation** - Uses OpenHands agent to automatically implement TODOs -- **PR Management** - Creates feature branches and pull requests automatically -- **Progress Tracking** - Tracks TODO processing status and PR creation -- **Comprehensive Reporting** - Detailed GitHub Actions summary with processing status -- **Configurable** - Customizable TODO identifiers and processing limits - -## How It Works - -### 1. Scan Phase - -The workflow scans your codebase for configurable TODO comments: - -```python icon="python" expandable examples/03_github_workflows/03_todo_management/scanner.py -#!/usr/bin/env python3 -""" -TODO Scanner for OpenHands Automated TODO Management - -Scans for configurable TODO comments in Python, TypeScript, Java, and Rust files. -Default identifier: TODO(openhands) -""" - -import argparse -import json -import logging -import os -import re -import sys -from pathlib import Path - - -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", - handlers=[ - # Log to stderr to avoid JSON interference - logging.StreamHandler(sys.stderr), - ], -) -logger = logging.getLogger(__name__) - - -def scan_file_for_todos( - file_path: Path, todo_identifier: str = "TODO(openhands)" -) -> list[dict]: - """Scan a single file for configurable TODO comments.""" - # Only scan specific file extensions - if file_path.suffix.lower() not in {".py", ".ts", ".java", ".rs"}: - logger.debug(f"Skipping file {file_path} (unsupported extension)") - return [] - - # Skip test files and example files that contain mock TODOs - file_str = str(file_path) - if ( - "/test" in file_str - or "/tests/" in file_str - or "test_" in file_path.name - # Skip examples - or "examples/03_github_workflows/03_todo_management/" in file_str - ): - logger.debug(f"Skipping test/example file: {file_path}") - return [] - - logger.debug(f"Scanning file: {file_path}") - - try: - with open(file_path, encoding="utf-8", errors="ignore") as f: - lines = f.readlines() - except (OSError, UnicodeDecodeError) as e: - logger.warning(f"Failed to read file {file_path}: {e}") - return [] - - todos = [] - # Escape special regex characters in the identifier - escaped_identifier = re.escape(todo_identifier) - todo_pattern = re.compile(rf"{escaped_identifier}(?::\s*(.*))?", re.IGNORECASE) - - for line_num, line in enumerate(lines, 1): - match = todo_pattern.search(line) - if match: - # Extract initial description from the TODO line - description = match.group(1).strip() if match.group(1) else "" - - # Look ahead for continuation lines that are also comments - continuation_lines = [] - for next_line_idx in range(line_num, len(lines)): - next_line = lines[next_line_idx] - next_stripped = next_line.strip() - - # Check if this line is a comment continuation - if ( - next_stripped.startswith("#") - and not next_stripped.startswith(f"# {todo_identifier}") - # Skip empty comment lines - and next_stripped != "#" - # Must have content after # - and len(next_stripped) > 1 - ): - # Extract comment content (remove # and leading whitespace) - comment_content = next_stripped[1:].strip() - - if comment_content: # Only add non-empty content - continuation_lines.append(comment_content) - elif next_stripped == "#": - # Empty comment line - continue looking - continue - else: - # Stop at first non-comment line - break - - # Combine description with continuation lines - if continuation_lines: - if description: - full_description = description + " " + " ".join(continuation_lines) - else: - full_description = " ".join(continuation_lines) - else: - full_description = description - - todo_item = { - "file": str(file_path), - "line": line_num, - "description": full_description, - } - todos.append(todo_item) - logger.info(f"Found TODO in {file_path}:{line_num}: {full_description}") - - if todos: - logger.info(f"Found {len(todos)} TODO(s) in {file_path}") - return todos - - -def scan_directory( - directory: Path, todo_identifier: str = "TODO(openhands)" -) -> list[dict]: - """Recursively scan a directory for configurable TODO comments.""" - logger.info(f"Scanning directory: {directory}") - all_todos = [] - - for root, dirs, files in os.walk(directory): - # Skip hidden and common ignore directories - dirs[:] = [ - d - for d in dirs - if not d.startswith(".") - and d - not in { - "__pycache__", - "node_modules", - ".venv", - "venv", - "build", - "dist", - } - ] - - for file in files: - file_path = Path(root) / file - todos = scan_file_for_todos(file_path, todo_identifier) - all_todos.extend(todos) - - return all_todos - - -def main(): - """Main function to scan for TODOs and output results.""" - parser = argparse.ArgumentParser( - description="Scan codebase for configurable TODO comments" - ) - parser.add_argument( - "directory", - nargs="?", - default=".", - help="Directory to scan (default: current directory)", - ) - parser.add_argument("--output", "-o", help="Output file (default: stdout)") - parser.add_argument( - "--identifier", - "-i", - default="TODO(openhands)", - help="TODO identifier to search for (default: TODO(openhands))", - ) - - args = parser.parse_args() - - path = Path(args.directory) - if not path.exists(): - logger.error(f"Path '{path}' does not exist") - return 1 - - if path.is_file(): - logger.info(f"Starting TODO scan on file: {path}") - todos = scan_file_for_todos(path, args.identifier) - else: - logger.info(f"Starting TODO scan in directory: {path}") - todos = scan_directory(path, args.identifier) - logger.info(f"Scan complete. Found {len(todos)} total TODO(s)") - output = json.dumps(todos, indent=2) - - if args.output: - with open(args.output, "w", encoding="utf-8") as f: - f.write(output) - print(f"Found {len(todos)} TODO(s), written to {args.output}") - else: - print(output) - - return 0 - - -if __name__ == "__main__": - exit(main()) -``` - -The scanner: -- Default identifier: `TODO(openhands)` (customizable via workflow input) -- Filters out false positives (documentation, test files, quoted strings) -- Supports Python, TypeScript, Java, and Rust files (`.py`, `.ts`, `.java`, `.rs`) -- Provides detailed logging of found TODOs - -Key functions: - -```python -def scan_file_for_todos(file_path: Path, todo_identifier: str = "TODO(openhands)") -> list[dict]: - """Scan a single file for configurable TODO comments.""" -``` - -### 2. Process Phase - -For each TODO found, the agent script implements it: - -```python icon="python" expandable examples/03_github_workflows/03_todo_management/agent_script.py -#!/usr/bin/env python3 -""" -TODO Agent for OpenHands Automated TODO Management - -This script processes individual TODO(openhands) comments using OpenHands agent -to implement the TODO. Designed for use with GitHub Actions workflows. - -Usage: - python agent_script.py - -Arguments: - todo_json: JSON string containing TODO information from scanner.py - -Environment Variables: - LLM_API_KEY: API key for the LLM (required) - LLM_MODEL: Language model to use (default: openhands/claude-sonnet-4-5-20250929) - LLM_BASE_URL: Optional base URL for LLM API - GITHUB_TOKEN: GitHub token for creating PRs (required) - GITHUB_REPOSITORY: Repository in format owner/repo (required) - -For setup instructions and usage examples, see README.md in this directory. -""" - -import argparse -import json -import os -import sys - -from prompt import PROMPT - -from openhands.sdk import LLM, Conversation, get_logger -from openhands.tools.preset.default import get_default_agent - - -logger = get_logger(__name__) - - -def process_todo(todo_data: dict): - """Process a single TODO item using OpenHands agent.""" - file_path = todo_data["file"] - line_num = todo_data["line"] - description = todo_data["description"] - - logger.info(f"Processing TODO in {file_path}:{line_num}") - - # Configure LLM - api_key = os.getenv("LLM_API_KEY") - if not api_key: - logger.error("LLM_API_KEY environment variable is not set.") - sys.exit(1) - - model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") - base_url = os.getenv("LLM_BASE_URL") - - llm_config = { - "model": model, - "api_key": api_key, - "service_id": "agent_script", - "drop_params": True, - } +The workflow is configurable and any identifier can be used in place of `TODO(openhands)` - if base_url: - llm_config["base_url"] = base_url - - llm = LLM(**llm_config) - - # Create the prompt - prompt = PROMPT.format( - file_path=file_path, - line_num=line_num, - description=description, - ) - - # Get the current working directory as workspace - cwd = os.getcwd() - - # Create agent with default tools - agent = get_default_agent( - llm=llm, - cli_mode=True, - ) - - # Create conversation - conversation = Conversation( - agent=agent, - workspace=cwd, - ) - - logger.info("Starting task execution...") - logger.info(f"Prompt: {prompt[:200]}...") - - # Send the prompt and run the agent - conversation.send_message(prompt) - conversation.run() - - logger.info("Task completed successfully") - - -def main(): - """Main function to process a TODO item.""" - parser = argparse.ArgumentParser( - description="Process a TODO(openhands) comment using OpenHands agent" - ) - parser.add_argument("todo_json", help="JSON string containing TODO information") - - args = parser.parse_args() - - try: - todo_data = json.loads(args.todo_json) - except json.JSONDecodeError as e: - logger.error(f"Invalid JSON input: {e}") - sys.exit(1) - - # Validate required fields - required_fields = ["file", "line", "description"] - for field in required_fields: - if field not in todo_data: - logger.error(f"Missing required field in TODO data: {field}") - sys.exit(1) - - # Process the TODO - process_todo(todo_data) - - -if __name__ == "__main__": - main() -``` - -The agent: -1. Configures the LLM using environment variables -2. Creates a prompt with the TODO details -3. Uses the default agent with standard tools -4. Runs the conversation to implement the TODO - -Key configuration: - -```python -# Configure LLM -llm = LLM( - model=model, - api_key=api_key, - service_id="agent_script", - drop_params=True, -) - -# Create agent with default tools -agent = get_default_agent(llm=llm, cli_mode=True) - -# Create conversation -conversation = Conversation(agent=agent, workspace=cwd) -conversation.send_message(prompt) -conversation.run() -``` - -### 3. Prompt Template - -The prompt guides the agent to implement the TODO and create a PR: - -```python icon="python" expandable examples/03_github_workflows/03_todo_management/prompt.py -"""Prompt template for TODO implementation.""" - -PROMPT = """Please implement a TODO comment in a codebase. - -IMPORTANT - Creating a Pull Request: -- Use the `gh pr create` command to create the PR -- The GITHUB_TOKEN environment variable is available for authentication -- PR Title: "[Openhands] {description}" -- Branch name "openhands/todo/***" - -Your task is to: -1. Analyze the TODO comment and understand what needs to be implemented -2. Search in github for any existing PRs that adress this TODO - Filter by title [Openhands]... Don't implement anything if such a PR exists -2. Create a feature branch for this implementation -3. Implement what is asked by the TODO -4. Create a pull request with your changes -5. Add 2 reviewers - * Tag the person who wrote the TODO as a reviewer - * read the git blame information for the files, and find the most recent and - active contributors to the file/location of the changes. - Assign one of these people as a reviewer. - - -Please make sure to: -- Create a descriptive branch name related to the TODO -- Fix the issue with clean code -- Include a test if needed, but not always necessary - -TODO Details: -- File: {file_path} -- Line: {line_num} -- Description: {description} -""" -``` - -The prompt instructs the agent to: -- Analyze the TODO comment -- Check for existing PRs addressing this TODO -- Create a feature branch (`openhands/todo/***`) -- Implement the requested changes -- Create a pull request with title `[Openhands] {description}` -- Add reviewers (TODO author and active contributors) - -## Usage - -### Manual Trigger - -1. Go to **Actions** → "Automated TODO Management" -2. Click **Run workflow** -3. (Optional) Configure parameters: - - **Max TODOs**: Maximum number of TODOs to process (default: 3) - - **TODO Identifier**: Custom identifier to search for (default: `TODO(openhands)`) -4. Click **Run workflow** - -### Adding TODO Comments - -Add TODO comments in the following format anywhere in your codebase: - -```python -# TODO(openhands): Add input validation for user email -def process_user_email(email): - return email.lower() - -# TODO(openhands): Implement caching mechanism for API responses -def fetch_api_data(endpoint): - # Current implementation without caching - return requests.get(endpoint).json() -``` - -**Supported Comment Styles:** -- `# TODO(openhands): description` (Python, Shell, etc.) -- `// TODO(openhands): description` (TypeScript, Java, Rust, etc.) - -**Custom Identifiers:** -You can use custom TODO identifiers like `TODO(myteam)`, `TODO[urgent]`, etc. Configure this in the workflow parameters. - -### Scanner CLI Usage - -You can also run the scanner directly from the command line: - -```bash -# Scan current directory with default identifier -python scanner.py . - -# Scan with custom identifier -python scanner.py . --identifier "TODO(myteam)" - -# Scan specific directory and save to file -python scanner.py /path/to/code --output todos.json - -# Get help -python scanner.py --help -``` - -## Configuration - -Edit `.github/workflows/todo-management.yml` to customize: - -```yaml -env: - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 - # LLM_BASE_URL: 'https://custom-api.example.com' # Optional - MAX_TODOS: 3 # Maximum number of TODOs to process - TODO_IDENTIFIER: "TODO(openhands)" # Customize the identifier -``` - -## Output Summary - -The workflow generates a comprehensive summary showing: -- All processed TODOs with their file locations -- Associated pull request URLs for successful implementations -- Processing status (success, partial, failed) for each TODO +## Features -Example output: -``` -āœ… Processed 3 TODOs - -šŸ“ TODO #1: Add input validation for user email - File: src/utils.py:45 - PR: https://github.com/user/repo/pull/123 - Status: āœ… Success - -šŸ“ TODO #2: Implement caching mechanism - File: src/api.py:78 - PR: https://github.com/user/repo/pull/124 - Status: āœ… Success -``` +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers ## Best Practices - **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow -- **Clear Descriptions** - Write descriptive TODO comments for better AI understanding +- **Clear Descriptions** - Write descriptive TODO comments - **Review PRs** - Always review the generated PRs before merging -- **Custom Identifiers** - Use team-specific identifiers like `TODO(backend)` for different teams -- **Scheduled Runs** - Set up cron schedules for regular TODO processing ## Related Documentation From 6a417853207ecadb202766fffd85efeb114b3579 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Thu, 23 Oct 2025 18:45:09 +0000 Subject: [PATCH 41/58] docs: sync code blocks from agent-sdk examples Synced from agent-sdk ref: main --- sdk/guides/agent-server/api-sandbox.mdx | 10 ++++----- sdk/guides/agent-server/docker-sandbox.mdx | 26 +++++++++++++++------- sdk/guides/agent-server/local-server.mdx | 4 ++-- sdk/guides/custom-tools.mdx | 2 +- sdk/guides/llm-routing.mdx | 2 +- sdk/guides/metrics.mdx | 2 +- 6 files changed, 28 insertions(+), 18 deletions(-) diff --git a/sdk/guides/agent-server/api-sandbox.mdx b/sdk/guides/agent-server/api-sandbox.mdx index 438391a6..c58b0c1b 100644 --- a/sdk/guides/agent-server/api-sandbox.mdx +++ b/sdk/guides/agent-server/api-sandbox.mdx @@ -23,7 +23,7 @@ Usage: uv run examples/24_remote_convo_with_api_sandboxed_server.py Requirements: - - LITELLM_API_KEY: API key for LLM access + - LLM_API_KEY: API key for LLM access - RUNTIME_API_KEY: API key for runtime API access """ @@ -45,13 +45,13 @@ from openhands.workspace import APIRemoteWorkspace logger = get_logger(__name__) -api_key = os.getenv("LITELLM_API_KEY") -assert api_key, "LITELLM_API_KEY required" +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" llm = LLM( usage_id="agent", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) @@ -62,7 +62,7 @@ if not runtime_api_key: with APIRemoteWorkspace( - runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), runtime_api_key=runtime_api_key, server_image="ghcr.io/openhands/agent-server:main-python", ) as workspace: diff --git a/sdk/guides/agent-server/docker-sandbox.mdx b/sdk/guides/agent-server/docker-sandbox.mdx index e07b44da..6d76bde0 100644 --- a/sdk/guides/agent-server/docker-sandbox.mdx +++ b/sdk/guides/agent-server/docker-sandbox.mdx @@ -44,7 +44,7 @@ assert api_key is not None, "LLM_API_KEY environment variable is not set." llm = LLM( usage_id="agent", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) @@ -258,19 +258,30 @@ assert api_key is not None, "LLM_API_KEY environment variable is not set." llm = LLM( usage_id="agent", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) # Create a Docker-based remote workspace with extra ports for VSCode access + + +def detect_platform(): + """Detects the correct Docker platform string.""" + import platform + + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + with DockerWorkspace( base_image="nikolaik/python-nodejs:python3.12-nodejs22", host_port=18010, - # TODO: Change this to your platform if not linux/arm64 - platform="linux/arm64", + platform=detect_platform(), extra_ports=True, # Expose extra ports for VSCode and VNC ) as workspace: - """Extra ports allows you to access VSCode at localhost:8011""" + """Extra ports allows you to access VSCode at localhost:18011""" # Create agent agent = get_default_agent( @@ -441,7 +452,7 @@ assert api_key is not None, "LLM_API_KEY environment variable is not set." llm = LLM( usage_id="agent", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) @@ -458,7 +469,6 @@ def detect_platform(): with DockerWorkspace( base_image="nikolaik/python-nodejs:python3.12-nodejs22", host_port=8010, - # TODO: Change this to your platform if not linux/arm64 platform=detect_platform(), extra_ports=True, # Expose extra ports for VSCode and VNC ) as workspace: @@ -492,7 +502,7 @@ with DockerWorkspace( logger.info(f"\nšŸ“‹ Conversation ID: {conversation.state.id}") logger.info("šŸ“ Sending first message...") conversation.send_message( - "Could you go to https://all-hands.dev/ blog page and summarize main " + "Could you go to https://openhands.dev/ blog page and summarize main " "points of the latest blog?" ) conversation.run() diff --git a/sdk/guides/agent-server/local-server.mdx b/sdk/guides/agent-server/local-server.mdx index 0c0b3b1c..3aa92104 100644 --- a/sdk/guides/agent-server/local-server.mdx +++ b/sdk/guides/agent-server/local-server.mdx @@ -139,13 +139,13 @@ assert api_key is not None, "LLM_API_KEY environment variable is not set." llm = LLM( usage_id="agent", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) title_gen_llm = LLM( usage_id="title-gen-llm", model="litellm_proxy/openai/gpt-5-mini", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) diff --git a/sdk/guides/custom-tools.mdx b/sdk/guides/custom-tools.mdx index 8426c10b..12a33b46 100644 --- a/sdk/guides/custom-tools.mdx +++ b/sdk/guides/custom-tools.mdx @@ -115,7 +115,7 @@ class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): def __init__(self, bash: BashExecutor): self.bash: BashExecutor = bash - def __call__(self, action: GrepAction) -> GrepObservation: + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 root = os.path.abspath(action.path) pat = shlex.quote(action.pattern) root_q = shlex.quote(root) diff --git a/sdk/guides/llm-routing.mdx b/sdk/guides/llm-routing.mdx index b76c392f..0766af67 100644 --- a/sdk/guides/llm-routing.mdx +++ b/sdk/guides/llm-routing.mdx @@ -48,7 +48,7 @@ primary_llm = LLM( secondary_llm = LLM( usage_id="agent-secondary", model="litellm_proxy/mistral/devstral-small-2507", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=base_url, api_key=SecretStr(api_key), ) multimodal_router = MultimodalRouter( diff --git a/sdk/guides/metrics.mdx b/sdk/guides/metrics.mdx index e8b73516..a960431a 100644 --- a/sdk/guides/metrics.mdx +++ b/sdk/guides/metrics.mdx @@ -332,7 +332,7 @@ conversation.run() second_llm = LLM( usage_id="demo-secondary", model="litellm_proxy/anthropic/claude-sonnet-4-5-20250929", - base_url="https://llm-proxy.eval.all-hands.dev", + base_url=os.getenv("LLM_BASE_URL"), api_key=SecretStr(api_key), ) conversation.llm_registry.add(second_llm) From 37880d40c1b821d03a29643b6ea3e7f8ef0b3ae6 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 20:49:55 +0200 Subject: [PATCH 42/58] Revise PR Review guide for clarity and detail Updated the description and features of the PR review guide to clarify the automation process and improve readability. --- sdk/guides/github-workflows/pr-review.mdx | 40 +++-------------------- 1 file changed, 5 insertions(+), 35 deletions(-) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index a36aa8b2..fe431917 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -1,13 +1,13 @@ --- title: PR Review -description: Automate pull request reviews with AI-powered code analysis using GitHub Actions. +description: Use OpenHands Agent to generate meaningful pull request review --- This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) -Automatically review pull requests when labeled, providing comprehensive feedback on code quality, security, and best practices. +Automatically review pull requests that is labeled with `review-this`, providing comprehensive feedback on code quality, security, and best practices. ```yaml icon="yaml" expandable agent-sdk/examples/03_github_workflows/01_basic_action/workflow.yml @@ -28,41 +28,11 @@ cp examples/github_workflows/02_pr_review/workflow.yml .github/workflows/pr-revi ## Features -- **Automatic Trigger** - Reviews start when `review-this` label is added -- **Comprehensive Analysis** - Analyzes changes in full repository context -- **Detailed Feedback** - Covers code quality, security, best practices +- **Automatic Trigger** - Reviews start when `review-this` label is added and is posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices - **GitHub Integration** - Posts comments directly to the PR -## Usage - -### Trigger a Review - -1. Open a pull request -2. Add the `review-this` label -3. Wait for the workflow to complete -4. Review feedback posted as PR comments - -## Configuration - -Edit `.github/workflows/pr-review.yml` to customize: - -```yaml -env: - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 - # LLM_BASE_URL: 'https://custom-api.example.com' # Optional -``` - -## Review Coverage - -The agent analyzes: - -- **Code Quality** - Readability, maintainability, patterns -- **Security** - Potential vulnerabilities and risks -- **Best Practices** - Language and framework conventions -- **Improvements** - Specific actionable suggestions -- **Positive Feedback** - Recognition of good practices - -## Related Documentation +## Related Files - [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/agent_script.py) - [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/workflow.yml) From 89b0452c43d0ebdbb9370acd89c9187d76315912 Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 23 Oct 2025 18:57:56 +0000 Subject: [PATCH 43/58] Replace routine-maintenance.mdx with assign-reviews.mdx - Remove routine-maintenance.mdx documentation - Add assign-reviews.mdx documentation following similar style to pr-review.mdx and todo-management.mdx - Update docs.json navigation to include assign-reviews instead of routine-maintenance Co-authored-by: openhands --- docs.json | 2 +- .../github-workflows/assign-reviews.mdx | 48 ++++++++++++ .../github-workflows/routine-maintenance.mdx | 74 ------------------- 3 files changed, 49 insertions(+), 75 deletions(-) create mode 100644 sdk/guides/github-workflows/assign-reviews.mdx delete mode 100644 sdk/guides/github-workflows/routine-maintenance.mdx diff --git a/docs.json b/docs.json index 3f392f8e..83a24e9d 100644 --- a/docs.json +++ b/docs.json @@ -235,8 +235,8 @@ { "group": "GitHub Workflows", "pages": [ + "sdk/guides/github-workflows/assign-reviews", "sdk/guides/github-workflows/pr-review", - "sdk/guides/github-workflows/routine-maintenance", "sdk/guides/github-workflows/todo-management" ] } diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx new file mode 100644 index 00000000..671ad5d3 --- /dev/null +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -0,0 +1,48 @@ +--- +title: Assign Reviews +description: Automatically assign reviewers to pull requests using OpenHands Agent +--- + + +This example is available on GitHub: [examples/03_github_workflows/04_assign_reviews/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/04_assign_reviews) + + +Automatically assign relevant reviewers to pull requests based on code changes, file ownership, and contributor expertise using intelligent analysis from OpenHands Agent. + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows/assign-reviews.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. Configure GitHub Actions permissions +# Settings → Actions → General → Workflow permissions +# Enable: "Read and write permissions" +``` + +## Features + +- **Smart Assignment** - Analyzes code changes and assigns most relevant reviewers +- **File-based Routing** - Considers file ownership and expertise areas +- **Team Integration** - Works with GitHub teams and CODEOWNERS files +- **Automatic Trigger** - Runs when pull requests are opened or updated + +## Configuration + +The workflow can be customized through environment variables: + +```yaml +env: + MAX_REVIEWERS: 3 + EXCLUDE_AUTHOR: true + TEAM_ASSIGNMENT: true +``` + +## Related Documentation + +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/workflow.yml) +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/prompt.py) \ No newline at end of file diff --git a/sdk/guides/github-workflows/routine-maintenance.mdx b/sdk/guides/github-workflows/routine-maintenance.mdx deleted file mode 100644 index 55336226..00000000 --- a/sdk/guides/github-workflows/routine-maintenance.mdx +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: Routine Maintenance -description: Automate routine maintenance tasks with GitHub Actions and OpenHands agents. ---- - - -This example is available on GitHub: [examples/github_workflows/01_basic_action/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/01_basic_action) - - -Set up automated or scheduled GitHub Actions workflows to handle routine maintenance tasks like dependency updates, documentation improvements, and code cleanup. - -## Quick Start - -```bash -# 1. Copy workflow to your repository -cp examples/github_workflows/01_basic_action/workflow.yml .github/workflows/maintenance.yml - -# 2. Configure secrets in GitHub Settings → Secrets -# Add: LLM_API_KEY - -# 3. Configure the prompt in workflow.yml -# See below for options -``` - -## Configuration - -### Option A: Direct Prompt - -```yaml -env: - PROMPT_STRING: 'Check for outdated dependencies and create a PR to update them' - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 -``` - -### Option B: Remote Prompt - -```yaml -env: - PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' - LLM_MODEL: openhands/claude-sonnet-4-5-20250929 -``` - -## Usage - -### Manual Trigger - -1. Go to **Actions** → "Maintenance Task" -2. Click **Run workflow** -3. Optionally override prompt settings -4. Click **Run workflow** - -### Scheduled Runs - -Uncomment the schedule section in `workflow.yml`: - -```yaml -on: - schedule: - - cron: "0 2 * * *" # Run at 2 AM UTC daily -``` - -## Example Use Cases - -- **Dependency Updates** - Check and update outdated packages -- **Documentation** - Update docs to reflect code changes -- **Test Coverage** - Identify and improve under-tested code -- **Linting** - Apply formatting and linting fixes -- **Link Validation** - Find and report broken links - -## Related Documentation - -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/01_basic_action/workflow.yml) -- [GitHub Actions Cron Syntax](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule) From 34eb647f7dbe3fc80d2efce8e95779b19a24344f Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:07:28 +0200 Subject: [PATCH 44/58] Simplify assign reviews guide content Removed features and configuration sections to streamline the guide. --- .../github-workflows/assign-reviews.mdx | 24 +++---------------- 1 file changed, 3 insertions(+), 21 deletions(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 671ad5d3..d5ee9221 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -7,7 +7,7 @@ description: Automatically assign reviewers to pull requests using OpenHands Age This example is available on GitHub: [examples/03_github_workflows/04_assign_reviews/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/04_assign_reviews)
-Automatically assign relevant reviewers to pull requests based on code changes, file ownership, and contributor expertise using intelligent analysis from OpenHands Agent. +Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. ## Quick Start @@ -23,26 +23,8 @@ cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows # Enable: "Read and write permissions" ``` -## Features - -- **Smart Assignment** - Analyzes code changes and assigns most relevant reviewers -- **File-based Routing** - Considers file ownership and expertise areas -- **Team Integration** - Works with GitHub teams and CODEOWNERS files -- **Automatic Trigger** - Runs when pull requests are opened or updated - -## Configuration - -The workflow can be customized through environment variables: - -```yaml -env: - MAX_REVIEWERS: 3 - EXCLUDE_AUTHOR: true - TEAM_ASSIGNMENT: true -``` - -## Related Documentation +## Related Files - [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/agent_script.py) - [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/workflow.yml) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/prompt.py) \ No newline at end of file +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/prompt.py) From 71a33658b24a28cb47b71c1c2e82f081d479bcc1 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:08:01 +0200 Subject: [PATCH 45/58] Update description of OpenHands Agent's TODO handling Clarified the process of how the OpenHands Agent handles TODO comments by specifying that it picks reviewers based on code changes and file ownership. --- sdk/guides/github-workflows/todo-management.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx index 895f0db2..97f7f81f 100644 --- a/sdk/guides/github-workflows/todo-management.mdx +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -7,7 +7,7 @@ description: Implement TODOs using OpenHands Agent This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) -Automatically scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on who contributed to the code near the TODO. +Automatically scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership ## Quick Start From 5b64e519813dd4f70e306087e6abba6a9eb0a548 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:09:35 +0200 Subject: [PATCH 46/58] Fix grammar in PR review guide --- sdk/guides/github-workflows/pr-review.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index fe431917..a0cf0559 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -7,7 +7,7 @@ description: Use OpenHands Agent to generate meaningful pull request review This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) -Automatically review pull requests that is labeled with `review-this`, providing comprehensive feedback on code quality, security, and best practices. +Automatically review pull requests when labeled with `review-this`, providing comprehensive feedback on code quality, security, and best practices. ```yaml icon="yaml" expandable agent-sdk/examples/03_github_workflows/01_basic_action/workflow.yml From d9f2ad381236faecf9a2edff8df0a465659b499d Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:11:45 +0200 Subject: [PATCH 47/58] Update description for assigning reviewers --- sdk/guides/github-workflows/assign-reviews.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index d5ee9221..673e3ca7 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -1,6 +1,6 @@ --- title: Assign Reviews -description: Automatically assign reviewers to pull requests using OpenHands Agent +description: Automatically assign relevant reviewers to pull requests using OpenHands Agent --- From 83271a87d7bc594b688c7cde172ac5a4f7de9cc2 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:14:30 +0200 Subject: [PATCH 48/58] Simplify feedback description in PR review guide Removed 'comprehensive' from the feedback description for clarity. --- sdk/guides/github-workflows/pr-review.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index a0cf0559..a79a9077 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -7,7 +7,7 @@ description: Use OpenHands Agent to generate meaningful pull request review This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) -Automatically review pull requests when labeled with `review-this`, providing comprehensive feedback on code quality, security, and best practices. +Automatically review pull requests when labeled with `review-this`, providing feedback on code quality, security, and best practices. ```yaml icon="yaml" expandable agent-sdk/examples/03_github_workflows/01_basic_action/workflow.yml From 422dcd6957aae14fc66061ac419007feb1fe750b Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 21:15:20 +0200 Subject: [PATCH 49/58] Update wording for TODO comment scanning description --- sdk/guides/github-workflows/todo-management.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx index 97f7f81f..e9f71ab6 100644 --- a/sdk/guides/github-workflows/todo-management.mdx +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -7,7 +7,7 @@ description: Implement TODOs using OpenHands Agent This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) -Automatically scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership ## Quick Start From d48eb0b8a25367efeb9d5cdb17510e338749fbb6 Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Fri, 24 Oct 2025 03:20:11 +0800 Subject: [PATCH 50/58] Apply suggestion from @xingyaoww --- sdk/guides/github-workflows/pr-review.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index a79a9077..6d5a18e2 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -4,7 +4,7 @@ description: Use OpenHands Agent to generate meaningful pull request review --- -This example is available on GitHub: [examples/github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/github_workflows/02_pr_review) +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) Automatically review pull requests when labeled with `review-this`, providing feedback on code quality, security, and best practices. From efe24fc138b69e272e92a8254ad5e76a5e43552a Mon Sep 17 00:00:00 2001 From: Xingyao Wang Date: Fri, 24 Oct 2025 03:21:12 +0800 Subject: [PATCH 51/58] Update sdk/guides/github-workflows/pr-review.mdx --- sdk/guides/github-workflows/pr-review.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index 6d5a18e2..eada3f5b 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -17,7 +17,7 @@ Automatically review pull requests when labeled with `review-this`, providing fe ```bash # 1. Copy workflow to your repository -cp examples/github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml # 2. Configure secrets in GitHub Settings → Secrets # Add: LLM_API_KEY From 20f8afcad06117f4c04b1e6fc6cd4336c63b2b92 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:02:12 +0200 Subject: [PATCH 52/58] Update links in assign-reviews documentation --- sdk/guides/github-workflows/assign-reviews.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 673e3ca7..6e3a10ac 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -4,9 +4,9 @@ description: Automatically assign relevant reviewers to pull requests using Open --- -This example is available on GitHub: [examples/03_github_workflows/04_assign_reviews/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/04_assign_reviews) +This example is available on GitHub: [examples/03_github_workflows/04_assign_reviews/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/.github/workflows/assign-reviews.yml) - +.github/workflows/assign-reviews.yml Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. ## Quick Start @@ -25,6 +25,6 @@ cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows ## Related Files -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/workflow.yml) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/04_assign_reviews/prompt.py) +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) From 52e32189a76299493c8b5731cc30a61e1a59ef11 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:03:50 +0200 Subject: [PATCH 53/58] Reorder related files in assign-reviews.mdx --- sdk/guides/github-workflows/assign-reviews.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 6e3a10ac..bc230295 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -25,6 +25,5 @@ cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows ## Related Files -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) +- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) From 6b6a689cd7e138443dcfa8f791d8e046424ee48c Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:07:19 +0200 Subject: [PATCH 54/58] Clarify GitHub workflow and related files Updated the description of the GitHub workflow for assigning reviews and clarified the related files section. --- sdk/guides/github-workflows/assign-reviews.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index bc230295..26edb801 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -8,6 +8,7 @@ This example is available on GitHub: [examples/03_github_workflows/04_assign_rev .github/workflows/assign-reviews.yml Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. +This workflow uses the basic action workflow template that allows to perform any basic action with the OpenHands Agent. ## Quick Start @@ -25,5 +26,5 @@ cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows ## Related Files -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Basic Action Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) - [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) From 573a145efc8fb2e0b63f3f04f1e428e10e911341 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:49:53 +0200 Subject: [PATCH 55/58] Update GitHub link for assign reviews example --- sdk/guides/github-workflows/assign-reviews.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 26edb801..6454d461 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -4,7 +4,7 @@ description: Automatically assign relevant reviewers to pull requests using Open --- -This example is available on GitHub: [examples/03_github_workflows/04_assign_reviews/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/.github/workflows/assign-reviews.yml) +This example is available on GitHub: [assign_reviews.yml](https://github.com/All-Hands-AI/agent-sdk/tree/main/.github/workflows/assign-reviews.yml) .github/workflows/assign-reviews.yml Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. From 011afcec8f618c5090b8104503ef5b0a907c4019 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:51:14 +0200 Subject: [PATCH 56/58] Fix link formatting in assign-reviews.mdx --- sdk/guides/github-workflows/assign-reviews.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 6454d461..2eaab0f8 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -4,7 +4,7 @@ description: Automatically assign relevant reviewers to pull requests using Open --- -This example is available on GitHub: [assign_reviews.yml](https://github.com/All-Hands-AI/agent-sdk/tree/main/.github/workflows/assign-reviews.yml) +This example is available on GitHub: [.github/workflows/assign-reviews.yml](https://github.com/All-Hands-AI/agent-sdk/tree/main/.github/workflows/assign-reviews.yml) .github/workflows/assign-reviews.yml Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. From 63c44d0146aa6cd0391c79a79d36cd4ab387b634 Mon Sep 17 00:00:00 2001 From: simonrosenberg <157206163+simonrosenberg@users.noreply.github.com> Date: Thu, 23 Oct 2025 23:52:07 +0200 Subject: [PATCH 57/58] Update assign-reviews.mdx --- sdk/guides/github-workflows/assign-reviews.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 2eaab0f8..1e600fd8 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -6,7 +6,7 @@ description: Automatically assign relevant reviewers to pull requests using Open This example is available on GitHub: [.github/workflows/assign-reviews.yml](https://github.com/All-Hands-AI/agent-sdk/tree/main/.github/workflows/assign-reviews.yml) -.github/workflows/assign-reviews.yml + Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. This workflow uses the basic action workflow template that allows to perform any basic action with the OpenHands Agent. From 48cb4249726c73fd6019358e0e8573e4143abaa1 Mon Sep 17 00:00:00 2001 From: enyst Date: Thu, 23 Oct 2025 22:08:58 +0000 Subject: [PATCH 58/58] docs(sdk): point agent-sdk links to OpenHands/agent-sdk and fix incorrect example paths\n\n- Replace All-Hands-AI/agent-sdk with OpenHands/agent-sdk in 3 new guides\n- Use blob for file links and tree for directories\n- Fix PR Review links to correct examples/03_github_workflows paths\n\nCo-authored-by: openhands --- sdk/guides/github-workflows/assign-reviews.mdx | 6 +++--- sdk/guides/github-workflows/pr-review.mdx | 8 ++++---- sdk/guides/github-workflows/todo-management.mdx | 10 +++++----- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/sdk/guides/github-workflows/assign-reviews.mdx b/sdk/guides/github-workflows/assign-reviews.mdx index 1e600fd8..2fbdee28 100644 --- a/sdk/guides/github-workflows/assign-reviews.mdx +++ b/sdk/guides/github-workflows/assign-reviews.mdx @@ -4,7 +4,7 @@ description: Automatically assign relevant reviewers to pull requests using Open --- -This example is available on GitHub: [.github/workflows/assign-reviews.yml](https://github.com/All-Hands-AI/agent-sdk/tree/main/.github/workflows/assign-reviews.yml) +This example is available on GitHub: [.github/workflows/assign-reviews.yml](https://github.com/OpenHands/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) Automatically assign relevant reviewers to pull requests based on code changes and file ownership using OpenHands Agent. @@ -26,5 +26,5 @@ cp examples/03_github_workflows/04_assign_reviews/workflow.yml .github/workflows ## Related Files -- [Basic Action Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) +- [Basic Action Agent Script](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/agent-sdk/blob/main/.github/workflows/assign-reviews.yml) diff --git a/sdk/guides/github-workflows/pr-review.mdx b/sdk/guides/github-workflows/pr-review.mdx index eada3f5b..824dbb3b 100644 --- a/sdk/guides/github-workflows/pr-review.mdx +++ b/sdk/guides/github-workflows/pr-review.mdx @@ -4,7 +4,7 @@ description: Use OpenHands Agent to generate meaningful pull request review --- -This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) Automatically review pull requests when labeled with `review-this`, providing feedback on code quality, security, and best practices. @@ -34,6 +34,6 @@ cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-r ## Related Files -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/agent_script.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/workflow.yml) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/github_workflows/02_pr_review/prompt.py) +- [Agent Script](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) diff --git a/sdk/guides/github-workflows/todo-management.mdx b/sdk/guides/github-workflows/todo-management.mdx index e9f71ab6..9d0426fb 100644 --- a/sdk/guides/github-workflows/todo-management.mdx +++ b/sdk/guides/github-workflows/todo-management.mdx @@ -4,7 +4,7 @@ description: Implement TODOs using OpenHands Agent --- -This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/All-Hands-AI/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership @@ -42,7 +42,7 @@ The workflow is configurable and any identifier can be used in place of `TODO(op ## Related Documentation -- [Agent Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) -- [Scanner Script](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) -- [Workflow File](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) -- [Prompt Template](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) +- [Agent Script](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py)