Skip to content

Commit b419a23

Browse files
xingyaowwopenhands-agentall-hands-botgithub-actions[bot]
authored
SDK Docs for local "guides" (#41)
* docs: standardize SDK guides format and merge reasoning documentation - Standardized all 30 SDK guide files with consistent format: - Added <Note> components with GitHub example links - Used auto-sync code blocks with 'icon' and 'expandable' attributes - Added 'Running the Example' bash code blocks - Included brief explanations with line highlights - Added 'Next Steps' sections with related links - Merged anthropic-thinking.mdx and responses-reasoning.mdx into model-reasoning.mdx: - Created unified guide covering both Anthropic thinking blocks and OpenAI responses reasoning - Single example demonstrating both approaches - Deleted old separate documentation files - Updated docs.json navigation: - Organized all SDK guides into logical categories: * Getting Started (hello-world, custom-tools, mcp) * Agent Configuration (llm-registry, llm-routing, model-reasoning) * Conversation Management (persistence, pause-and-resume, confirmation-mode, etc.) * Agent Capabilities (activate-skill, async, planning-agent-workflow, etc.) * Agent Behavior (stuc * Agent Behavior (stuc * Agent Behavior (stuc * Ag n * Agent Behavior (stuc * Agent Behavnces in sdk/arch/sdk/llm.mdx Co-authored-by: openhands <openhands@all-hands.dev> * remove agent-server api * done with llm and condenser * sync(openapi): agent-sdk/main cab92fc * rename tab to sdk * merge convo cost * merge confirmation mode * docs: sync code blocks from agent-sdk examples Synced from agent-sdk ref: main * link to llm registry * rename and some improvements * fix image input * fix pause and resume * improve secrets * done with interactive terminal * audited browser-use doc * add stuck detector * improve custom planning agent * rename * fix broken link * remove stuff not ready yet * fix typo * docs: sync code blocks from agent-sdk examples Synced from agent-sdk ref: main --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: all-hands-bot <contact@all-hands.dev> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent a419dbb commit b419a23

21 files changed

+3343
-12
lines changed

docs.json

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@
174174
]
175175
},
176176
{
177-
"tab": "Agent SDK (v1)",
177+
"tab": "SDK",
178178
"pages": [
179179
"sdk/index",
180180
"sdk/getting-started",
@@ -184,6 +184,38 @@
184184
"sdk/guides/hello-world",
185185
"sdk/guides/custom-tools",
186186
"sdk/guides/mcp",
187+
"sdk/guides/skill",
188+
"sdk/guides/context-condenser",
189+
"sdk/guides/security",
190+
"sdk/guides/metrics",
191+
"sdk/guides/secrets",
192+
{
193+
"group": "LLM Features",
194+
"pages": [
195+
"sdk/guides/llm-registry",
196+
"sdk/guides/llm-routing",
197+
"sdk/guides/llm-reasoning",
198+
"sdk/guides/llm-image-input"
199+
]
200+
},
201+
{
202+
"group": "Agent Features",
203+
"pages": [
204+
"sdk/guides/agent-interactive-terminal",
205+
"sdk/guides/agent-browser-use",
206+
"sdk/guides/agent-custom",
207+
"sdk/guides/agent-stuck-detector"
208+
]
209+
},
210+
{
211+
"group": "Conversation Features",
212+
"pages": [
213+
"sdk/guides/convo-persistence",
214+
"sdk/guides/convo-pause-and-resume",
215+
"sdk/guides/convo-send-message-while-running",
216+
"sdk/guides/convo-async"
217+
]
218+
},
187219
{
188220
"group": "Remote Agent Server",
189221
"pages": [
@@ -226,10 +258,6 @@
226258
{
227259
"tab": "OpenHands (Core) API",
228260
"openapi": "openapi/openapi.json"
229-
},
230-
{
231-
"tab": "Agent SDK (API)",
232-
"openapi": "openapi/agent-sdk.json"
233261
}
234262
],
235263
"global": {

sdk/guides/agent-browser-use.mdx

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Browser Use
3+
description: Enable web browsing and interaction capabilities for your agent.
4+
---
5+
6+
<Note>
7+
This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/All-Hands-AI/agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py)
8+
</Note>
9+
10+
The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, and extracting content - all through natural language instructions.
11+
12+
```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py
13+
import os
14+
15+
from pydantic import SecretStr
16+
17+
from openhands.sdk import (
18+
LLM,
19+
Agent,
20+
Conversation,
21+
Event,
22+
LLMConvertibleEvent,
23+
get_logger,
24+
)
25+
from openhands.sdk.tool import Tool, register_tool
26+
from openhands.tools.browser_use import BrowserToolSet
27+
from openhands.tools.execute_bash import BashTool
28+
from openhands.tools.file_editor import FileEditorTool
29+
30+
31+
logger = get_logger(__name__)
32+
33+
# Configure LLM
34+
api_key = os.getenv("LLM_API_KEY")
35+
assert api_key is not None, "LLM_API_KEY environment variable is not set."
36+
model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929")
37+
base_url = os.getenv("LLM_BASE_URL")
38+
llm = LLM(
39+
usage_id="agent",
40+
model=model,
41+
base_url=base_url,
42+
api_key=SecretStr(api_key),
43+
)
44+
45+
# Tools
46+
cwd = os.getcwd()
47+
register_tool("BashTool", BashTool)
48+
register_tool("FileEditorTool", FileEditorTool)
49+
register_tool("BrowserToolSet", BrowserToolSet)
50+
tools = [
51+
Tool(
52+
name="BashTool",
53+
),
54+
Tool(name="FileEditorTool"),
55+
Tool(name="BrowserToolSet"),
56+
]
57+
58+
# If you need fine-grained browser control, you can manually register individual browser
59+
# tools by creating a BrowserToolExecutor and providing factories that return customized
60+
# Tool instances before constructing the Agent.
61+
62+
# Agent
63+
agent = Agent(llm=llm, tools=tools)
64+
65+
llm_messages = [] # collect raw LLM messages
66+
67+
68+
def conversation_callback(event: Event):
69+
if isinstance(event, LLMConvertibleEvent):
70+
llm_messages.append(event.to_llm_message())
71+
72+
73+
conversation = Conversation(
74+
agent=agent, callbacks=[conversation_callback], workspace=cwd
75+
)
76+
77+
conversation.send_message(
78+
"Could you go to https://openhands.dev/ blog page and summarize main "
79+
"points of the latest blog?"
80+
)
81+
conversation.run()
82+
83+
84+
print("=" * 100)
85+
print("Conversation finished. Got the following LLM messages:")
86+
for i, message in enumerate(llm_messages):
87+
print(f"Message {i}: {str(message)[:200]}")
88+
```
89+
90+
```bash Running the Example
91+
export LLM_API_KEY="your-api-key"
92+
cd agent-sdk
93+
uv run python examples/01_standalone_sdk/15_browser_use.py
94+
```
95+
96+
## How It Works
97+
98+
The example demonstrates combining multiple tools to create a capable web research agent:
99+
100+
1. **BrowserToolSet**: Provides automated browser control for web interaction
101+
2. **FileEditorTool**: Allows the agent to read and write files if needed
102+
3. **BashTool**: Enables command-line operations for additional functionality
103+
104+
The agent uses these tools to:
105+
- Navigate to specified URLs
106+
- Interact with web page elements (clicking, scrolling, etc.)
107+
- Extract and analyze content from web pages
108+
- Summarize information from multiple sources
109+
110+
In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points.
111+
112+
## Customization
113+
114+
For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. This gives you fine-grained control over which browser capabilities are exposed to the agent.
115+
116+
## Next Steps
117+
118+
- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools
119+
- **[MCP Integration](/sdk/guides/mcp)** - Connect external services

0 commit comments

Comments
 (0)