An AI agent that runs as a Cloudflare Durable Object. It connects to an LLM, executes tools (shell, files, search), and streams everything over WebSocket.
1,700 lines of TypeScript. 6 source files. Zero frameworks on the frontend.
Browser ─── WebSocket ───▶ PaiAgent DO ───▶ LLM (Claude / GPT-4o)
│
└──▶ ShellSession DO (exec, read, write)
git clone https://github.com/acoyfellow/pai-agent.git
cd pai-agent
npm install
# Add your API key (any one of these)
echo 'OPENAI_API_KEY=sk-...' > .dev.vars
# or: ANTHROPIC_API_KEY=sk-ant-...
# or: OPENROUTER_API_KEY=sk-or-...
npx wrangler dev
# Open http://localhost:8787That's it. Type a message. Watch the agent think, call tools, and respond.
- Your message goes over WebSocket to the
PaiAgentDurable Object - The DO calls the LLM with your message + tool definitions
- If the LLM wants to use a tool → the DO executes it via
ShellSession - Tool result goes back to the LLM → it can call more tools or respond
- This loops up to 25 times until the LLM gives a final answer
- Every step streams to the browser in real-time
src/
index.ts → HTTP router (Hono) + WebSocket routing (agents-sdk)
agent.ts → PaiAgent Durable Object — the core agent loop
llm.ts → LLM provider abstraction (Anthropic, OpenAI, OpenRouter)
tools.ts → Tool definitions: shell_exec, read_file, write_file, search_files, list_directory, think
shell.ts → ShellSession Durable Object — sandboxed file/command execution
types.ts → All TypeScript interfaces
public/
index.html → Chat UI (vanilla HTML/CSS/JS, no build step)
migrations/
0001_init.sql → D1 schema (sessions + messages)
# Create the D1 database
npx wrangler d1 create pai-agent-db
# Copy the database_id into wrangler.jsonc
# Run migrations
npx wrangler d1 migrations apply pai-agent-db --remote
# Set your LLM API key
npx wrangler secret put OPENAI_API_KEY
# Deploy
npx wrangler deployOpen src/tools.ts. Each tool is an object:
{
name: "my_tool",
description: "What this tool does (the LLM reads this)",
parameters: {
type: "object",
properties: {
input: { type: "string", description: "..." },
},
required: ["input"],
},
execute: async (args, ctx) => {
// ctx.shellExec(), ctx.readFile(), ctx.writeFile(), ctx.broadcast()
return "result string shown to the LLM";
},
}Add it to the TOOLS array. The agent picks it up automatically.
The agent auto-detects which key you've set:
| Key | Provider | Default model |
|---|---|---|
ANTHROPIC_API_KEY |
Anthropic (direct) | claude-sonnet-4-20250514 |
OPENAI_API_KEY |
OpenAI | gpt-4o |
OPENROUTER_API_KEY |
OpenRouter | anthropic/claude-sonnet-4-20250514 |
Priority: Anthropic → OpenAI → OpenRouter. Set any one in .dev.vars for local or wrangler secret put for production.
Send (client → server):
{"type": "message", "content": "your question"}
{"type": "configure", "model": "gpt-4o"}
{"type": "cancel"}Receive (server → client):
{"type": "status", "status": "thinking"}
{"type": "message", "id": "...", "role": "assistant", "content": "...", "timestamp": 123}
{"type": "tool_call", "messageId": "...", "tool": {"name": "shell_exec", "arguments": {"command": "ls"}}}
{"type": "tool_result", "messageId": "...", "result": {"content": "...", "isError": false}}
{"type": "done"}
{"type": "error", "message": "..."}All responses follow { ok, command, result, error, fix, next_actions }.
| Method | Path | Description |
|---|---|---|
| GET | /api |
Health + endpoint discovery |
| GET | /api/sessions |
List sessions |
| POST | /api/sessions |
Create session → returns wsUrl |
| DELETE | /api/sessions/:id |
Delete session |
| WS | /agents/pai-agent/:id |
WebSocket connection |
Why Durable Objects? Each agent session is a stateful, long-lived object. The DO holds the conversation in memory, maintains a WebSocket connection to the browser, and orchestrates the LLM ↔ tool loop without any external state management. When the session is idle, Cloudflare hibernates it. When a message arrives, it wakes up with full state intact.
Why a separate ShellSession DO? Isolation. Each agent gets its own sandboxed execution environment. In production, this maps to a Cloudflare Container. In the prototype, it simulates a filesystem in memory.
Why not use the Vercel AI SDK / LangChain / etc? This is 236 lines of LLM glue (src/llm.ts). It calls fetch(). It parses JSON. Adding a framework would triple the dependency tree to save maybe 50 lines.
| What | Why |
|---|---|
| Cloudflare Workers | Edge runtime, zero cold start |
| Durable Objects | Stateful WebSocket + agent state |
agents SDK |
WebSocket routing, DO lifecycle |
| Hono | HTTP routing (7kb) |
| D1 | SQLite at the edge |
| Anthropic / OpenAI | LLM providers |
| Vanilla HTML/JS | No build step for the frontend |
MIT