agentforge

A small Express + TypeScript service that exposes a Claude-backed agent with streaming, tool use, prompt caching, and an eval harness. Sibling project to tinybus and collab-board — same observability and operational shape, different language and problem domain.

Why this exists

I wanted a Node-side artifact that demonstrates the patterns most production agent backends actually need:

Streaming end-to-end — Anthropic SSE → server SSE → browser. The user sees tokens appear as the model generates them.
Tool use loop — model issues tool_use, server executes locally, results feed back into the next turn. Sequential by default; the policy is one comment block away from parallel.
Prompt caching — system prompt and tool definitions marked cache_control: ephemeral. Cache hits land at ~10% the input cost.
Idempotent failure — agent runs are safe to abort mid-stream; partial assistant text is persisted only after the run completes.
Eval harness — POST /v1/evals/run exercises a fixture suite against the live model. Pass/fail is checked on substring + tool-call expectations.

Stack

Node 22, TypeScript (strict, ESM, noUncheckedIndexedAccess)
Express 4, Pino, Zod, prom-client
Anthropic SDK (@anthropic-ai/sdk ^0.40)
Postgres + raw SQL migrations (no ORM, mirrors tinybus)
Vitest + supertest

Endpoints

Method	Path	Purpose
`POST`	`/v1/chat`	Run the agent on a new user message; streams `AgentEvent`s as SSE
`POST`	`/v1/sessions`	Create a session
`GET`	`/v1/sessions/:id`	Fetch a session and its full message history
`POST`	`/v1/evals/run`	Run the eval suite, return a pass/fail report
`GET`	`/healthz`	Liveness + DB reachability check
`GET`	`/metrics`	Prometheus text format

Chat request

POST /v1/chat
Content-Type: application/json

{ "session_id": "<uuid?>", "message": "Why does tinybus use SKIP LOCKED?" }

Response is text/event-stream. Each event has a type matching one of: session, iteration_start, text_delta, tool_call, tool_result, usage, done, error.

Local development

docker compose up -d              # starts Postgres on :5432
cp .env.example .env              # set ANTHROPIC_API_KEY
npm install
npm run migrate
npm run dev                       # http://localhost:3000

Open http://localhost:3000/ for the bundled demo UI.

Deploy to Railway

Push this folder to a GitHub repo.
New Railway project → "Deploy from GitHub" → pick the repo.
Add the Postgres plugin — Railway sets DATABASE_URL automatically.
Set ANTHROPIC_API_KEY in the service variables.
Railway runs node dist/db/migrate.js && node dist/index.js per railway.json.

The bundled Dockerfile is multi-stage and runs on a distroless base, same shape as tinybus.

Architecture

HTTP → Express middleware (pino-http, prom-client) → routes
   /v1/chat → SSE → runAgent() ──▶ Anthropic.messages.stream
                          │              │
                          │              └──▶ tool_use? ──▶ TOOLS_BY_NAME[name].execute()
                          │                                         │
                          │       ◀────────── tool_result ◀─────────┘
                          ▼
                       Postgres (sessions, messages)

runAgent is an async generator yielding AgentEvents — the route layer is a thin pump from the generator to the SSE writer. That separation makes the loop straightforward to unit-test (no HTTP) and easy to drive from the eval harness, which uses the same generator.

Where to extend

Three spots have explicit TODO blocks for design decisions:

src/agent/loop.ts — tool-execution policy (sequential vs parallel, error handling).
src/agent/cache.ts — prompt-cache breakpoint strategy.
src/obs/metrics.ts — which agent-specific counters and histograms to expose.

Each block explains the trade-off and the 5–10 lines of code needed.

Why no ORM?

Same reason tinybus uses raw SQL — three tables, two indexes, full SQL is easier to read than ORM-flavored SQL. If this grows past ~10 tables, reach for drizzle.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
public		public
src		src
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
railway.json		railway.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentforge

Why this exists

Stack

Endpoints

Chat request

Local development

Deploy to Railway

Architecture

Where to extend

Why no ORM?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentforge

Why this exists

Stack

Endpoints

Chat request

Local development

Deploy to Railway

Architecture

Where to extend

Why no ORM?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages