Skip to content

acoyfellow/pai-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pai-agent

An AI agent that runs as a Cloudflare Durable Object. It connects to an LLM, executes tools (shell, files, search), and streams everything over WebSocket.

1,700 lines of TypeScript. 6 source files. Zero frameworks on the frontend.

Browser ─── WebSocket ───▶ PaiAgent DO ───▶ LLM (Claude / GPT-4o)
                                │
                                └──▶ ShellSession DO (exec, read, write)

Quickstart

git clone https://github.com/acoyfellow/pai-agent.git
cd pai-agent
npm install

# Add your API key (any one of these)
echo 'OPENAI_API_KEY=sk-...' > .dev.vars
# or: ANTHROPIC_API_KEY=sk-ant-...
# or: OPENROUTER_API_KEY=sk-or-...

npx wrangler dev
# Open http://localhost:8787

That's it. Type a message. Watch the agent think, call tools, and respond.

What happens when you send a message

  1. Your message goes over WebSocket to the PaiAgent Durable Object
  2. The DO calls the LLM with your message + tool definitions
  3. If the LLM wants to use a tool → the DO executes it via ShellSession
  4. Tool result goes back to the LLM → it can call more tools or respond
  5. This loops up to 25 times until the LLM gives a final answer
  6. Every step streams to the browser in real-time

Project structure

src/
  index.ts      → HTTP router (Hono) + WebSocket routing (agents-sdk)
  agent.ts      → PaiAgent Durable Object — the core agent loop
  llm.ts        → LLM provider abstraction (Anthropic, OpenAI, OpenRouter)
  tools.ts      → Tool definitions: shell_exec, read_file, write_file, search_files, list_directory, think
  shell.ts      → ShellSession Durable Object — sandboxed file/command execution
  types.ts      → All TypeScript interfaces
public/
  index.html    → Chat UI (vanilla HTML/CSS/JS, no build step)
migrations/
  0001_init.sql → D1 schema (sessions + messages)

How to deploy

# Create the D1 database
npx wrangler d1 create pai-agent-db
# Copy the database_id into wrangler.jsonc

# Run migrations
npx wrangler d1 migrations apply pai-agent-db --remote

# Set your LLM API key
npx wrangler secret put OPENAI_API_KEY

# Deploy
npx wrangler deploy

How to add a new tool

Open src/tools.ts. Each tool is an object:

{
  name: "my_tool",
  description: "What this tool does (the LLM reads this)",
  parameters: {
    type: "object",
    properties: {
      input: { type: "string", description: "..." },
    },
    required: ["input"],
  },
  execute: async (args, ctx) => {
    // ctx.shellExec(), ctx.readFile(), ctx.writeFile(), ctx.broadcast()
    return "result string shown to the LLM";
  },
}

Add it to the TOOLS array. The agent picks it up automatically.

How to swap LLM providers

The agent auto-detects which key you've set:

Key Provider Default model
ANTHROPIC_API_KEY Anthropic (direct) claude-sonnet-4-20250514
OPENAI_API_KEY OpenAI gpt-4o
OPENROUTER_API_KEY OpenRouter anthropic/claude-sonnet-4-20250514

Priority: Anthropic → OpenAI → OpenRouter. Set any one in .dev.vars for local or wrangler secret put for production.

WebSocket protocol

Send (client → server):

{"type": "message", "content": "your question"}
{"type": "configure", "model": "gpt-4o"}
{"type": "cancel"}

Receive (server → client):

{"type": "status", "status": "thinking"}
{"type": "message", "id": "...", "role": "assistant", "content": "...", "timestamp": 123}
{"type": "tool_call", "messageId": "...", "tool": {"name": "shell_exec", "arguments": {"command": "ls"}}}
{"type": "tool_result", "messageId": "...", "result": {"content": "...", "isError": false}}
{"type": "done"}
{"type": "error", "message": "..."}

HTTP API

All responses follow { ok, command, result, error, fix, next_actions }.

Method Path Description
GET /api Health + endpoint discovery
GET /api/sessions List sessions
POST /api/sessions Create session → returns wsUrl
DELETE /api/sessions/:id Delete session
WS /agents/pai-agent/:id WebSocket connection

Architecture

Why Durable Objects? Each agent session is a stateful, long-lived object. The DO holds the conversation in memory, maintains a WebSocket connection to the browser, and orchestrates the LLM ↔ tool loop without any external state management. When the session is idle, Cloudflare hibernates it. When a message arrives, it wakes up with full state intact.

Why a separate ShellSession DO? Isolation. Each agent gets its own sandboxed execution environment. In production, this maps to a Cloudflare Container. In the prototype, it simulates a filesystem in memory.

Why not use the Vercel AI SDK / LangChain / etc? This is 236 lines of LLM glue (src/llm.ts). It calls fetch(). It parses JSON. Adding a framework would triple the dependency tree to save maybe 50 lines.

Stack

What Why
Cloudflare Workers Edge runtime, zero cold start
Durable Objects Stateful WebSocket + agent state
agents SDK WebSocket routing, DO lifecycle
Hono HTTP routing (7kb)
D1 SQLite at the edge
Anthropic / OpenAI LLM providers
Vanilla HTML/JS No build step for the frontend

License

MIT

About

Pi Agent Core — Research & analysis agent as a Cloudflare Durable Object with shell access, tool use, and WebSocket streaming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors