Skip to content

AuctorAI/durable_agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Durable Agents

Runnable companion to The Agent Is a Workflow That Writes Itself: recursive subagents lower to child workflows and PTC runs through a deterministic workflow-space interpreter.

The demo is deterministic by default. No LLM key is required — the model is a scripted activity that emits tool calls or final answers. You can opt into a live OpenAI or Anthropic model when you want to poke at it manually.

Quick start

uv sync --group dev
uv run python -m scripts.demo

scripts/demo.py starts temporal server start-dev plus one worker on one task queue, then drops you into a small REPL. Each prompt starts a real DemoAgentWorkflow, prints the Temporal graph it produced, and leaves any file side effects in .demo-sandbox for inspection. Open http://localhost:8233 while the REPL is running to inspect workflow histories in Temporal Web. This default path expects the Temporal CLI to be on PATH.

The default model is scripted, so no LLM key is required. Use /scenarios and /scenario ptc inside the REPL for deterministic walkthroughs of direct tools, programmatic tool calling, subagents, retries, and validation feedback. To use a live provider, pass --model; otherwise the demo stays in scripted mode.

You can also run one prompt or scripted scenario and exit:

uv run python -m scripts.demo --scenario ptc
uv run python -m scripts.demo --scenario retry
uv run python -m scripts.demo "write a note in the sandbox"

If you already have a Temporal dev server running:

temporal server start-dev

uv run python -m scripts.demo --no-start-dev-server --address 127.0.0.1:7233

For a lightweight run without Temporal Web, use Temporal's in-process test server:

uv run python -m scripts.demo --in-process --scenario ptc

To run with a real provider instead of the scripted model, provide --model and the corresponding API key. scripts/demo.py loads .env automatically, so you can put keys there instead of exporting them in your shell:

echo 'OPENAI_API_KEY=...' > .env
uv run python -m scripts.demo --model gpt-5.5

# or
echo 'ANTHROPIC_API_KEY=...' > .env
uv run python -m scripts.demo --model claude-3-5-haiku-latest

Bare model names default to OpenAI except names beginning with claude, which default to Anthropic. You can also be explicit with openai:gpt-5.5 or anthropic:claude-3-5-haiku-latest.

Tests

uv run pytest

The pytest suite runs the same workflow shape in Temporal's in-process test environment and asserts that normal tools become generated activities, run_ptc and spawn stay workflow-native, inner PTC calls use the target tool's transport, retries happen at the Temporal boundary, validation failures are model-visible, and cancellation propagates as cancellation.

Examples

Build the default durable agent

Compose normal Pydantic AI capabilities, then wrap the agent with TemporalAgent:

from datetime import timedelta

from pydantic_ai import Agent
from pydantic_ai.durable_exec.temporal import TemporalAgent
from temporalio.common import RetryPolicy

from durable_agents.agent import run_durable_subagent
from durable_agents.capabilities import DemoDeps, build_capabilities

sandbox, faults, subagents, ptc = build_capabilities(run_durable_subagent)

agent = Agent(
    "openai:gpt-5.5",
    deps_type=DemoDeps,
    name="durable_capability_agent",
    instructions="Use tools for durable work. Use spawn for bounded subagents.",
    capabilities=[sandbox, faults, subagents, ptc],
)

durable_agent = TemporalAgent(
    agent,
    model_activity_config={
        "start_to_close_timeout": timedelta(seconds=30),
        "retry_policy": RetryPolicy(maximum_attempts=1),
    },
    tool_activity_config={
        **faults.durable_tool_activity_config,
        **subagents.durable_tool_activity_config,
        **ptc.durable_tool_activity_config,
    },
)

Mark orchestration tools as workflow-native

Most tools should become generated Temporal activities. Two tools are different: spawn starts a child workflow, and run_ptc orchestrates inner tool calls inside the parent workflow. Mark those wrappers with False:

tool_activity_config = {
    "subagents": {"spawn": False},
    "ptc": {"run_ptc": False},
}

Their inner work still uses the normal durable transport. A write_file(...) inside run_ptc is a sandbox activity. A spawn(...) inside run_ptc is a child workflow.

Add a new activity-backed capability

Capability tools are ordinary async Python functions. If they are not explicitly marked workflow-native, Pydantic AI's Temporal plugin runs them as generated activities:

from dataclasses import dataclass
from typing import cast

from pydantic_ai import RunContext
from pydantic_ai.capabilities import AbstractCapability
from pydantic_ai.toolsets import AgentToolset
from pydantic_ai.toolsets.function import FunctionToolset

from durable_agents.capabilities import DemoDeps, sandbox_path


@dataclass
class NotesCapability(AbstractCapability[DemoDeps]):
    toolset_id: str = "notes"

    def get_toolset(self) -> AgentToolset[DemoDeps]:
        async def save_note(ctx: RunContext[DemoDeps], title: str, body: str) -> str:
            target = sandbox_path(ctx.deps.sandbox_dir, f"{title}.txt")
            target.write_text(body)
            return str(target)

        return cast(AgentToolset[DemoDeps], FunctionToolset([save_note], id=self.toolset_id))

Spawn subagents

The model calls spawn. The runtime starts a child DemoAgentWorkflow on the same task queue with PARENT_CLOSE_POLICY_TERMINATE:

{
    "tool_name": "spawn",
    "args": {
        "agent_name": "generic",
        "task": "Read the sandbox files and summarize the plan.",
    },
}

Use programmatic tool calling

Use run_ptc when the model needs loops, branches, fanout, or aggregation over existing tools:

{
    "tool_name": "run_ptc",
    "args": {
        "code": """
results = await asyncio.gather(
    write_file(path="notes/a.txt", content="alpha"),
    spawn(agent_name="generic", task="Summarize alpha."),
)
results
""".strip(),
    },
}

run_ptc itself is not an activity. Its inner write_file(...) call is still a generated activity, and its inner spawn(...) call is still a child workflow.

Execution model

model step                  -> generated Temporal activity
normal capability tool      -> generated Temporal activity
subagent spawn              -> child DemoAgentWorkflow on the same task queue
programmatic tool calling   -> workflow-native PtcCapability, no run_ptc activity
PTC inner tool/spawn call   -> the same activity/child workflow as a direct call

Transient infrastructure failures retry as Temporal activities. Validation and known execution failures come back to the model as repairable feedback. Parent cancellation propagates to active child work and does not trigger another model turn.

Tool errors are routed by class in src/durable_agents/errors.py: ToolValidationError surfaces as repairable model feedback, ToolExecutionError as a known terminal failure, and ToolRetryableError opts the activity into its Temporal retry policy.

Testing scheme

Each scenario in tests/helpers/scenarios.py declares a scripted model route, expected sandbox side effects, and expected Temporal history shape. The matrix covers:

  • sandbox and fault tools are generated activities;
  • run_ptc and spawn stay workflow-native;
  • PTC inner calls use the target tool's activity or child-workflow transport;
  • retryable activity failures retry by Temporal policy;
  • validation failures are model-visible and not infrastructure retries;
  • child workflow failures bubble back as model-visible spawn failures;
  • PTC parallel branches cancel in-flight siblings on first failure;
  • parent cancellation interrupts direct, child, and PTC-inner work.

See tests/README.md for the full scenario catalog.

What to look at

  • src/durable_agents/capabilities/: the capability package. sandbox.py, subagents.py, ptc.py, and faults.py hold the behavior; __init__.py wires the default demo set together.
  • src/durable_agents/agent.py: builds the Pydantic AI Agent, wraps it with TemporalAgent, and marks run_ptc and spawn as workflow-native.
  • tests/helpers/scenarios.py: the scenario matrix — model scripts plus the expected Temporal graph and sandbox files.
  • tests/README.md: the test contract in prose.
  • scripts/migration_demo.py / scripts/migration_fuzz.py: see Cross-worker migration.

Cross-worker migration

scripts/migration_demo.py and scripts/migration_fuzz.py start two subprocess.Popen workers — each its own Python interpreter and SDK client — polling one task queue, then run a parent workflow that fans out subagents. They print which worker identity completed each workflow's tasks, so you can see when a child lands on a different worker than the parent.

uv run python -m scripts.migration_demo
uv run python -m scripts.migration_fuzz --iterations 10

migration_fuzz generates random workflow shapes from three families and reports per-family cross-process placement frequency:

  • direct-spawns: parent emits N parallel spawn tool calls.
  • ptc-spawns: parent emits one run_ptc that gathers N spawn calls.
  • ptc-mixed: parent emits one run_ptc that gathers W write_file calls and N spawn calls.
uv run python -m scripts.migration_fuzz --iterations 20 --workers 3 --max-n 5
uv run python -m scripts.migration_fuzz --kind ptc-spawns --iterations 10

Both scripts require the Temporal CLI on PATH. Temporal does not guarantee fair dispatch across pollers, so any single run may keep all work on one worker; the fuzzer repeats and reports how often a child landed on a different worker than the parent.

Deliberate simplifications

This repo omits a few things the production setup has: policy gates, platform adapters, controlled worker-failover drills, and detached spawn/wait handles.

License

MIT. Take, copy, modify, and reuse the code freely. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages