A small Express + TypeScript service that exposes a Claude-backed agent with streaming, tool use, prompt caching, and an eval harness. Sibling project to tinybus and collab-board — same observability and operational shape, different language and problem domain.
I wanted a Node-side artifact that demonstrates the patterns most production agent backends actually need:
- Streaming end-to-end — Anthropic SSE → server SSE → browser. The user sees tokens appear as the model generates them.
- Tool use loop — model issues
tool_use, server executes locally, results feed back into the next turn. Sequential by default; the policy is one comment block away from parallel. - Prompt caching — system prompt and tool definitions marked
cache_control: ephemeral. Cache hits land at ~10% the input cost. - Idempotent failure — agent runs are safe to abort mid-stream; partial assistant text is persisted only after the run completes.
- Eval harness —
POST /v1/evals/runexercises a fixture suite against the live model. Pass/fail is checked on substring + tool-call expectations.
- Node 22, TypeScript (strict, ESM,
noUncheckedIndexedAccess) - Express 4, Pino, Zod, prom-client
- Anthropic SDK (
@anthropic-ai/sdk^0.40) - Postgres + raw SQL migrations (no ORM, mirrors tinybus)
- Vitest + supertest
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/chat |
Run the agent on a new user message; streams AgentEvents as SSE |
POST |
/v1/sessions |
Create a session |
GET |
/v1/sessions/:id |
Fetch a session and its full message history |
POST |
/v1/evals/run |
Run the eval suite, return a pass/fail report |
GET |
/healthz |
Liveness + DB reachability check |
GET |
/metrics |
Prometheus text format |
POST /v1/chat
Content-Type: application/json
{ "session_id": "<uuid?>", "message": "Why does tinybus use SKIP LOCKED?" }Response is text/event-stream. Each event has a type matching one of:
session, iteration_start, text_delta, tool_call, tool_result, usage, done, error.
docker compose up -d # starts Postgres on :5432
cp .env.example .env # set ANTHROPIC_API_KEY
npm install
npm run migrate
npm run dev # http://localhost:3000Open http://localhost:3000/ for the bundled demo UI.
- Push this folder to a GitHub repo.
- New Railway project → "Deploy from GitHub" → pick the repo.
- Add the Postgres plugin — Railway sets
DATABASE_URLautomatically. - Set
ANTHROPIC_API_KEYin the service variables. - Railway runs
node dist/db/migrate.js && node dist/index.jsperrailway.json.
The bundled Dockerfile is multi-stage and runs on a distroless base, same shape as tinybus.
HTTP → Express middleware (pino-http, prom-client) → routes
/v1/chat → SSE → runAgent() ──▶ Anthropic.messages.stream
│ │
│ └──▶ tool_use? ──▶ TOOLS_BY_NAME[name].execute()
│ │
│ ◀────────── tool_result ◀─────────┘
▼
Postgres (sessions, messages)
runAgent is an async generator yielding AgentEvents — the route layer is a thin pump from the generator to the SSE writer. That separation makes the loop straightforward to unit-test (no HTTP) and easy to drive from the eval harness, which uses the same generator.
Three spots have explicit TODO blocks for design decisions:
src/agent/loop.ts— tool-execution policy (sequential vs parallel, error handling).src/agent/cache.ts— prompt-cache breakpoint strategy.src/obs/metrics.ts— which agent-specific counters and histograms to expose.
Each block explains the trade-off and the 5–10 lines of code needed.
Same reason tinybus uses raw SQL — three tables, two indexes, full SQL is easier to read than ORM-flavored SQL. If this grows past ~10 tables, reach for drizzle.
MIT