Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 214 additions & 0 deletions sdk/guides/agent-tom.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
---
title: Theory of Mind (ToM) Agent Integration
description: Enable personalized user understanding and guidance through ToM agent integration for better handling of vague or ambiguous tasks.
---

## Overview

The ToM (Theory of Mind) agent integration provides your agent with capabilities to understand user intent and preferences through user modeling. When tasks are vague or ambiguous, the agent can consult the ToM agent for personalized guidance based on conversation history and user patterns.

This feature is useful when:
- User instructions are unclear or under-specified
- You need help understanding what the user actually wants
- You want guidance on the best approach for the current task
- Building user preferences and patterns from conversation history

## Quick Start

<Note>
This example is available on GitHub: [examples/01_standalone_sdk/25_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_tom_agent.py)
</Note>

```python icon="python" expandable examples/01_standalone_sdk/25_tom_agent.py
"""Example demonstrating Tom agent with Theory of Mind capabilities.

This example shows how to set up an agent with Tom tools for getting
personalized guidance based on user modeling. Tom tools include:
- TomConsultTool: Get guidance for vague or unclear tasks
- SleeptimeComputeTool: Index conversations for user modeling
"""

import os

from pydantic import SecretStr

from openhands.sdk import LLM, Agent, Conversation
from openhands.sdk.tool import Tool, register_tool
from openhands.tools.preset.default import get_default_tools, register_default_tools
from openhands.tools.tom_consult import SleeptimeComputeTool, TomConsultTool
from openhands.tools.tom_consult.action import SleeptimeComputeAction


# Configure LLM
api_key: str | None = os.getenv("LLM_API_KEY")
assert api_key is not None, "LLM_API_KEY environment variable is not set."

llm: LLM = LLM(
model="openhands/claude-sonnet-4-5-20250929",
api_key=SecretStr(api_key),
usage_id="agent",
drop_params=True,
)

# Register tools (default tools + Tom tools)
register_default_tools(enable_browser=False) # CLI mode, no browser
register_tool("TomConsultTool", TomConsultTool)
register_tool("SleeptimeComputeTool", SleeptimeComputeTool)

# Build tools list with Tom tools
tools = get_default_tools(enable_browser=False)

# Configure Tom tools with parameters
tom_params: dict[str, bool | str] = {
"enable_rag": True, # Enable RAG in Tom agent
}

# Add LLM configuration for Tom tools (uses same LLM as main agent)
tom_params["llm_model"] = llm.model
if llm.api_key:
if isinstance(llm.api_key, SecretStr):
tom_params["api_key"] = llm.api_key.get_secret_value()
else:
tom_params["api_key"] = llm.api_key
if llm.base_url:
tom_params["api_base"] = llm.base_url

# Add both Tom tools to the agent
tools.append(Tool(name="TomConsultTool", params=tom_params))
tools.append(Tool(name="SleeptimeComputeTool", params=tom_params))

# Create agent with Tom capabilities
# This agent can consult Tom for personalized guidance
# Note: Tom's user modeling data will be stored in ~/.openhands/
agent: Agent = Agent(llm=llm, tools=tools)

# Start conversation
cwd: str = os.getcwd()
PERSISTENCE_DIR = os.path.expanduser("~/.openhands")
CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations")
conversation = Conversation(
agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR
)

# Optionally run sleeptime compute to index existing conversations
# This builds user preferences and patterns from conversation history
sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute")
if sleeptime_compute_tool and sleeptime_compute_tool.executor:
print("\nRunning sleeptime compute to index conversations...")
sleeptime_result = sleeptime_compute_tool.executor(
SleeptimeComputeAction(), conversation
)
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")

# Send a potentially vague message where Tom consultation might help
conversation.send_message(
"I need to debug some code but I'm not sure where to start. "
+ "Can you help me figure out the best approach?"
)
conversation.run()

print("\n" + "=" * 80)
print("Tom agent consultation example completed!")
print("=" * 80)


# Optional: Index this conversation for Tom's user modeling
# This builds user preferences and patterns from conversation history
# Uncomment the lines below to index the conversation:
#
# conversation.send_message("Please index this conversation using sleeptime_compute")
# conversation.run()
# print("\nConversation indexed for user modeling!")

```

## Key Concepts

### TomConsultTool

The `TomConsultTool` allows your agent to consult the ToM agent for guidance. It analyzes:
- The current user message
- Conversation history and context
- User patterns from previous interactions

The tool returns personalized suggestions on how to approach the task.

### SleeptimeComputeTool

The `SleeptimeComputeTool` processes conversation history to build and update the user model. This tool:
- Indexes completed conversations
- Extracts user preferences and patterns
- Updates the ToM agent's understanding of the user

This is typically used at the end of conversations or when explicitly requested.

## Configuration

### Required Dependencies

The ToM agent integration requires the `tom-swe` package, which is included as an optional dependency:

```bash
pip install openhands-tools[tom] # When tom extra is available
# or install directly:
pip install tom-swe
```

### Tool Parameters

Both tools accept the following parameters:

- `enable_rag` (bool): Enable RAG capabilities in the ToM agent (default: True)
- `llm_model` (str): LLM model to use for ToM agent
- `api_key` (str): API key for the ToM agent's LLM
- `api_base` (str): Base URL for the ToM agent's LLM API

### Data Storage

User modeling data is stored in `~/.openhands/` by default. This includes:
- User preferences and patterns
- Processed conversation history
- RAG indices for efficient retrieval

## Best Practices

### When to Use ToM Consultation

Use the ToM consultation when:
- User messages are vague or ambiguous
- Multiple valid approaches exist and you need guidance
- You want to personalize responses based on user history
- Task requirements are under-specified

### Conversation Indexing

For best results:
- Index conversations after they're complete
- Run sleeptime compute periodically to update the user model
- Ensure sufficient conversation history exists before expecting personalized guidance

## Troubleshooting

### Import Errors

If you encounter import errors with `tom-swe`:

```python
# The imports are lazy-loaded, so they only fail when actually used
# Make sure tom-swe is installed:
pip install tom-swe
```

### Heavy Dependencies

Note that `tom-swe` has dependencies on scientific Python packages (numpy, scipy, pandas). These are:
- Required for running the ToM agent
- Excluded from the binary build of openhands-agent-server
- Only needed if you're using ToM features

## Related

- [Agent Delegation](/sdk/guides/agent-delegation) - Delegate tasks to sub-agents
- [Custom Tools](/sdk/guides/custom-tools) - Create your own agent tools
- [Conversation Persistence](/sdk/guides/convo-persistence) - Persist conversation history