feat(agent-think): repro branches + per-PR live demo deployments in the skills by mattzcarey · Pull Request #1868 · cloudflare/agents

mattzcarey · 2026-07-03T13:18:23Z

Follow-up to #1861 — skill + hardening batch. Skills are already seeded to the prod R2 bucket and the worker/image changes are deployed (runtime behavior is live; this PR records it in-repo).

reproduce — pushes the finished repro project as an orphan repro/issue-<n> branch on the target repo (the checkout IS the runnable project; re-runs force-push the same branch). Issue report + structured result carry reproBranchUrl so other agents can pull exactly what was built.

open-pr — every fix PR ships with a temporary deployment demoing the change: build + npm pack the fixed package, install the tarball into a minimal Vite demo (or the repro-branch project → broken-before/fixed-after on the same UI), wrangler deploy --temporary, demo URL + claim link in the PR body. Examples/docs touched by the change get updated in the same PR; no-runtime-surface changes skip the demo with a stated reason.

Incident hardening (RCA in-session; see also #1870): a full-monorepo pnpm install streamed unbounded output through the DO → isolate OOM → the persisted backlog burned the CPU limit on every wake, permanently bricking the session.

exec tool description + both skills mandate redirect-to-file + tail for noisy commands (/tmp is container-local, never synced).
pnpm baked into the container image (drops corepack enable).
ThinkAgent.resetSession() + AgentThink.resetSession(session) RPC — operator escape hatch: releases the container assignment, wipes all durable session state, aborts the isolate. RPC-only, not exposed over HTTP.
AGENTS.md edge cases updated (WS run_worker_first, deploys reset in-flight turns, unbounded exec output).

🤖 Generated with Claude Code

…branch New step: after verification, push the repro project as an orphan repro/issue-<n> branch on the target repo (checkout IS the runnable project; re-runs force-push the same branch; tar-copy excludes node_modules/dist/.wrangler/.env — the image has no rsync). The report comment and structured result now carry reproBranchUrl so other agents can pull exactly what was built. Best-effort: a protected-branch rejection is reported, not fatal.

changeset-bot · 2026-07-03T13:18:27Z

⚠️ No Changeset found

Latest commit: aa5c004

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

devin-ai-integration

Devin Review found 2 potential issues.

devin-ai-integration · 2026-07-03T13:20:09Z

+  `reproBranchUrl`. If the push is rejected (branch protection), say so in the
+  report and continue — the branch is best-effort, the report is not.
+
+## 7. Report back on the issue


🟡 Step cross-reference in the instructions now points to the wrong step after renumbering

The report step is renumbered from 6 to 7 (## 7. Report back on the issue at agent-think/skills/reproduce/SKILL.md:291) but an earlier instruction still says "include the URL + click instructions in your report (step 6)" (agent-think/skills/reproduce/SKILL.md:87), so the agent following these instructions is directed to the wrong step.

Impact: The AI agent may look at the wrong step ("Push the repro to a branch") instead of the reporting step when following the cross-reference.

Stale cross-reference from step renumbering

Line 87 reads:

After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).

Before this PR, step 6 was "Report back on the issue". The PR inserts a new step 6 ("Push the repro to a branch") and renumbers the old step 6 to step 7, but does not update the "(step 6)" reference at line 87 to "(step 7)".

Prompt for agents

Line 87 of agent-think/skills/reproduce/SKILL.md contains the text '(step 6)' which was a cross-reference to the 'Report back on the issue' step. Since that step was renumbered from 6 to 7 in this PR, the reference at line 87 needs to be updated from '(step 6)' to '(step 7)'. The line reads: '5. After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).' Change '(step 6)' to '(step 7)'.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in eaf5997 — the reference now points at step 7.

devin-ai-integration · 2026-07-03T13:20:11Z

+git checkout --orphan repro/issue-<issueNumber>
+git rm -rfq --cached . && git clean -fdq
+tar -C "$REPRO_DIR" --exclude node_modules --exclude dist \
+  --exclude .wrangler --exclude .env -cf - . | tar -xf -
+git add -A
+git commit -m "repro for #<issueNumber>: <one-line issue title>"
+git push -f origin repro/issue-<issueNumber>
+```


🚩 Destructive git operations on the repo checkout may prevent re-use

The new step 6 runs git rm -rfq --cached . && git clean -fdq on /workspace/repo (agent-think/skills/reproduce/SKILL.md:275), which wipes the working tree of the cloned repository to prepare the orphan branch. This is intentional for creating a clean orphan branch, but it means the original repo checkout is destroyed after this step. If the agent needs to reference source files from the repo for the root-cause hypothesis in step 7 (e.g., pointing at suspect file/line in packages/), the files would no longer be available. The step ordering places this after verification (step 5), so the repo has already been used for understanding the code (step 2), but the reporting step (step 7) asks for a root-cause hypothesis pointing at specific files — the agent would need to have noted those before step 6 destroys the checkout.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in eaf5997 — the publish now happens in a scratch git init dir pushed by URL (repro-publish-<n>, removed afterwards), so /workspace/repo is never touched and stays available for the root-cause hypothesis and follow-up PR work.

pkg-pr-new · 2026-07-03T13:22:42Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1868

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1868

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1868

create-think

npm i https://pkg.pr.new/create-think@1868

hono-agents

npm i https://pkg.pr.new/hono-agents@1868

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1868

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1868

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1868

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1868

commit: aa5c004

Every fix PR now includes a temporary deployment demoing the change: build+pack the fixed package, install the tarball into a minimal Vite demo (or reuse the repro/issue-<n> branch project), deploy --temporary, and put the demo URL + claim link in the PR body. Also: keep examples/ docs truthful in the same PR when the change affects them. Docs-only / no-runtime changes skip the demo with a stated reason.

RCA (PLANS/agents/agent-think-1845-rca.md): a full-monorepo pnpm install streamed ~8 min of output through the DO, OOMing the isolate; the persisted backlog then burned the 30s CPU limit on every wake, so recovery never ran and the session death-looped — re-kicks died in seconds against the same wall. - exec tool description + both skills now mandate redirect-to-file + tail for noisy commands (installs/builds/tests); /tmp is fine for logs (container-local, never synced). - pnpm baked into the image (drops the corepack step). - ThinkAgent.resetSession() + AgentThink.resetSession(session) RPC: operator escape hatch that releases the container assignment, wipes all durable state, and aborts the isolate — the only way to unbrick a poisoned session today. - AGENTS.md edge case documented.

The system prompt and skills advertise jq on the container backend, but the image never installed it (runs hit '/bin/sh: 1: jq: not found').

👀 (gh-app) only proves the webhook was seen; the Think turn adds 🚀 the moment it wakes (beforeTurn, direct REST with the installation token, once per dispatch, fire-and-forget). 👀-without-🚀 now cleanly reads as 'turn is dead'.

Per review: the MODEL adds the 🚀 via gh in the container as its first action — proving the whole chain (model loop, container, authed gh) is alive, not just that the turn woke. System-prompt instruction carries the exact comment id; 👀-without-🚀 reads as 'agent dead'.

…its are dry Workers AI billing path; the gpt-5.5 gateway-delegate plumbing stays in place for a one-line switch back.

…ale step ref The orphan-branch publish now happens in a scratch git-init dir pushed by URL, so /workspace/repo stays intact for the root-cause hypothesis and follow-up PR work. Fix the '(step 6)' cross-reference left stale by renumbering.

Nobody needs the claim link — reports/PRs now say 'Repro/Demo URL (expires after 60 mins): <url>' and the structured results lose claimUrl/demoClaimUrl.

- AGENTS.md described the replaced beforeTurn rocket hook; it now documents the shipped model-driven mechanism (and what 🚀 proves). - Step cross-references in both skills are by NAME, not number — the numeric ones already broke once this PR when a step was inserted. - resetSession's abort delay is a named constant with its rationale.

….md) The model now sees exactly read/write/edit/bash, aligned with pi and Claude Code (far more training data on 'bash' than a bespoke 'exec'). Think's workspace built-ins (list/find/grep/delete) are excluded from the provider request via beforeTurn activeTools (~600 prompt tokens reclaimed per call); ls/grep/rm/find happen through bash. bash (renamed from exec, src/tools/bash.ts) also gains the pi-inspired upgrades: optional timeout (SIGTERM + exit 124, GNU convention), head+tail truncation per stream (the old head-only cut threw away the tail where build errors live), per-backend guidance folded into the backend param's schema, and a slimmed description. read's offset/limit gain sane maxima so zod stops serializing MAX_SAFE_INTEGER noise. TOOLS.md documents the surface, what is deliberately not exposed, the lineage (Aron's fs-tools + pi conventions + Claude Code naming), and the four-bar test for adding a fifth tool.

The decisions live in the code comments and pi alignment speaks for itself; the doc was already drifting risk.

devin-ai-integration Bot reviewed Jul 3, 2026

View reviewed changes

mattzcarey changed the title ~~feat(agent-think): reproduce skill publishes the repro as a pullable branch~~ feat(agent-think): repro branches + per-PR live demo deployments in the skills Jul 3, 2026

mattzcarey added 11 commits July 3, 2026 15:06

fix(agent-think): install jq in the container image

b27dfdf

The system prompt and skills advertise jq on the container backend, but the image never installed it (runs hit '/bin/sh: 1: jq: not found').

chore(agent-think): fall back to Kimi K2.7 while unified-billing cred…

c844424

…its are dry Workers AI billing path; the gpt-5.5 gateway-delegate plumbing stays in place for a one-line switch back.

chore(agent-think): drop claim URLs from reports and PRs

4dc58a2

Nobody needs the claim link — reports/PRs now say 'Repro/Demo URL (expires after 60 mins): <url>' and the structured results lose claimUrl/demoClaimUrl.

chore(agent-think): move TOOLS.md to design/tools.md

68dc73e

chore(agent-think): drop the tools design doc

aa5c004

The decisions live in the code comments and pi alignment speaks for itself; the doc was already drifting risk.

mattzcarey merged commit 5ee3bf5 into main Jul 3, 2026
4 checks passed

mattzcarey deleted the feat/reproduce-skill-repro-branch branch July 3, 2026 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agent-think): repro branches + per-PR live demo deployments in the skills#1868

feat(agent-think): repro branches + per-PR live demo deployments in the skills#1868
mattzcarey merged 13 commits into
mainfrom
feat/reproduce-skill-repro-branch

mattzcarey commented Jul 3, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jul 3, 2026

Uh oh!

mattzcarey Jul 3, 2026

Uh oh!

devin-ai-integration Bot Jul 3, 2026

Uh oh!

mattzcarey Jul 3, 2026

Uh oh!

pkg-pr-new Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mattzcarey commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

mattzcarey Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

mattzcarey Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattzcarey commented Jul 3, 2026 •

edited

Loading

changeset-bot Bot commented Jul 3, 2026 •

edited

Loading

pkg-pr-new Bot commented Jul 3, 2026 •

edited

Loading