A demo showcasing an app maintained by a long-running agent.
Production apps need constant care: errors and stack traces to triage, slow endpoints to investigate, libraries to upgrade, regressions to roll back. That work eats developer time and can require on-call rotations.
This repo is demo that showcases what is possible in pushing those tasks onto a long-horizon agent.
- The demo product. The demo product is a Next.js travel-planner app.
- The managing agent. A cloud-hosted Anthropic Managed Agent runs every 30 minutes (and on GitHub webhooks). It monitors Vercel + Sentry, files Linear tickets for new issues, picks up tickets, writes code, runs Playwright to view it's own work in a local dev server, and opens PRs.
- The review agent. A separate reviewer agent reads each PR cold, builds, runs tests, posts approve / request-changes / escalate. Three rounds of changes → escalates to a human.
- The retro agent. The agent system is self-learning. Each session can append to
.claude/memory/. A retro agent runs daily, summarises 24h of activity in a Linear project update, and proposes memory edits as a PR.
Two cron-driven loops + webhook-driven resumes
- Manager (
/api/manager-tick, every 30 min). Reads memory, checks Vercel + Sentry, files tickets, picks up one Linear ticket, runs the dev loop locally in a sandbox (install, dev server, Playwright, build, lint), opens a PR. - Reviewer (
/api/reviewer-tick, every 30 min). Reads each open PR cold, builds, runs e2e, postsAGENT_REVIEW: APPROVED | REQUEST_CHANGES | ESCALATE. - Webhooks (
/api/github-webhook). PR opens → reviewer fires immediately. Reviewer approval → manager session resumes (full context) and squash-merges.
Daily retro (/api/retro-tick, 07:00 UTC). Reads 24h of activity, posts a Linear project update with a health field, and proposes .claude/memory/ edits as a PR.
Stack
- Anthropic Managed Agents — three agents (manager, reviewer, retro), all running in cloud sandboxes with a mounted clone of this repo.
- Linear MCP — Store and control plane for tasks; also the location where the agent provides project updates.
- GitHub MCP — PRs, reviews, files.
- Vercel MCP — deployments, logs.
- Sentry MCP — runtime errors.
- Vercel — hosting + cron + webhooks for the orchestration routes.
- Playwright — e2e tests inside the sandbox.
npm run manager # interactive REPL into a manager session
npm run manager:tick # one-shot operational loop, prints transcript
npm run manager:bootstrap # apply agent YAML changes to the live agent- Bring your own infrastructure. The agent cannot create infrastructure, set env vars, register OAuth apps, or pay for services. Bootstrapping is human-only.
- Further observability. Product analytics, metrics and so on are trivial to add once the agent is running, and thus not done here.
- Single-agent throughput. WIP limit of 1 — the manager won't pick up a second ticket while another is in flight. Keeps things simple, caps throughput.
I hope this inspires you to set up your own. If you would like to swap notes, I can be reached at willtay.com.