feat(agent-think): repro branches + per-PR live demo deployments in the skills#1868
Conversation
…branch New step: after verification, push the repro project as an orphan repro/issue-<n> branch on the target repo (checkout IS the runnable project; re-runs force-push the same branch; tar-copy excludes node_modules/dist/.wrangler/.env — the image has no rsync). The report comment and structured result now carry reproBranchUrl so other agents can pull exactly what was built. Best-effort: a protected-branch rejection is reported, not fatal.
|
| `reproBranchUrl`. If the push is rejected (branch protection), say so in the | ||
| report and continue — the branch is best-effort, the report is not. | ||
|
|
||
| ## 7. Report back on the issue |
There was a problem hiding this comment.
🟡 Step cross-reference in the instructions now points to the wrong step after renumbering
The report step is renumbered from 6 to 7 (## 7. Report back on the issue at agent-think/skills/reproduce/SKILL.md:291) but an earlier instruction still says "include the URL + click instructions in your report (step 6)" (agent-think/skills/reproduce/SKILL.md:87), so the agent following these instructions is directed to the wrong step.
Impact: The AI agent may look at the wrong step ("Push the repro to a branch") instead of the reporting step when following the cross-reference.
Stale cross-reference from step renumbering
Line 87 reads:
- After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).
Before this PR, step 6 was "Report back on the issue". The PR inserts a new step 6 ("Push the repro to a branch") and renumbers the old step 6 to step 7, but does not update the "(step 6)" reference at line 87 to "(step 7)".
Prompt for agents
Line 87 of agent-think/skills/reproduce/SKILL.md contains the text '(step 6)' which was a cross-reference to the 'Report back on the issue' step. Since that step was renumbered from 6 to 7 in this PR, the reference at line 87 needs to be updated from '(step 6)' to '(step 7)'. The line reads: '5. After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).' Change '(step 6)' to '(step 7)'.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in eaf5997 — the reference now points at step 7.
| git checkout --orphan repro/issue-<issueNumber> | ||
| git rm -rfq --cached . && git clean -fdq | ||
| tar -C "$REPRO_DIR" --exclude node_modules --exclude dist \ | ||
| --exclude .wrangler --exclude .env -cf - . | tar -xf - | ||
| git add -A | ||
| git commit -m "repro for #<issueNumber>: <one-line issue title>" | ||
| git push -f origin repro/issue-<issueNumber> | ||
| ``` |
There was a problem hiding this comment.
🚩 Destructive git operations on the repo checkout may prevent re-use
The new step 6 runs git rm -rfq --cached . && git clean -fdq on /workspace/repo (agent-think/skills/reproduce/SKILL.md:275), which wipes the working tree of the cloned repository to prepare the orphan branch. This is intentional for creating a clean orphan branch, but it means the original repo checkout is destroyed after this step. If the agent needs to reference source files from the repo for the root-cause hypothesis in step 7 (e.g., pointing at suspect file/line in packages/), the files would no longer be available. The step ordering places this after verification (step 5), so the repo has already been used for understanding the code (step 2), but the reporting step (step 7) asks for a root-cause hypothesis pointing at specific files — the agent would need to have noted those before step 6 destroys the checkout.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in eaf5997 — the publish now happens in a scratch git init dir pushed by URL (repro-publish-<n>, removed afterwards), so /workspace/repo is never touched and stays available for the root-cause hypothesis and follow-up PR work.
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Every fix PR now includes a temporary deployment demoing the change: build+pack the fixed package, install the tarball into a minimal Vite demo (or reuse the repro/issue-<n> branch project), deploy --temporary, and put the demo URL + claim link in the PR body. Also: keep examples/ docs truthful in the same PR when the change affects them. Docs-only / no-runtime changes skip the demo with a stated reason.
RCA (PLANS/agents/agent-think-1845-rca.md): a full-monorepo pnpm install streamed ~8 min of output through the DO, OOMing the isolate; the persisted backlog then burned the 30s CPU limit on every wake, so recovery never ran and the session death-looped — re-kicks died in seconds against the same wall. - exec tool description + both skills now mandate redirect-to-file + tail for noisy commands (installs/builds/tests); /tmp is fine for logs (container-local, never synced). - pnpm baked into the image (drops the corepack step). - ThinkAgent.resetSession() + AgentThink.resetSession(session) RPC: operator escape hatch that releases the container assignment, wipes all durable state, and aborts the isolate — the only way to unbrick a poisoned session today. - AGENTS.md edge case documented.
The system prompt and skills advertise jq on the container backend, but the image never installed it (runs hit '/bin/sh: 1: jq: not found').
👀 (gh-app) only proves the webhook was seen; the Think turn adds 🚀 the moment it wakes (beforeTurn, direct REST with the installation token, once per dispatch, fire-and-forget). 👀-without-🚀 now cleanly reads as 'turn is dead'.
Per review: the MODEL adds the 🚀 via gh in the container as its first action — proving the whole chain (model loop, container, authed gh) is alive, not just that the turn woke. System-prompt instruction carries the exact comment id; 👀-without-🚀 reads as 'agent dead'.
…its are dry Workers AI billing path; the gpt-5.5 gateway-delegate plumbing stays in place for a one-line switch back.
…ale step ref The orphan-branch publish now happens in a scratch git-init dir pushed by URL, so /workspace/repo stays intact for the root-cause hypothesis and follow-up PR work. Fix the '(step 6)' cross-reference left stale by renumbering.
Nobody needs the claim link — reports/PRs now say 'Repro/Demo URL (expires after 60 mins): <url>' and the structured results lose claimUrl/demoClaimUrl.
- AGENTS.md described the replaced beforeTurn rocket hook; it now documents the shipped model-driven mechanism (and what 🚀 proves). - Step cross-references in both skills are by NAME, not number — the numeric ones already broke once this PR when a step was inserted. - resetSession's abort delay is a named constant with its rationale.
….md) The model now sees exactly read/write/edit/bash, aligned with pi and Claude Code (far more training data on 'bash' than a bespoke 'exec'). Think's workspace built-ins (list/find/grep/delete) are excluded from the provider request via beforeTurn activeTools (~600 prompt tokens reclaimed per call); ls/grep/rm/find happen through bash. bash (renamed from exec, src/tools/bash.ts) also gains the pi-inspired upgrades: optional timeout (SIGTERM + exit 124, GNU convention), head+tail truncation per stream (the old head-only cut threw away the tail where build errors live), per-backend guidance folded into the backend param's schema, and a slimmed description. read's offset/limit gain sane maxima so zod stops serializing MAX_SAFE_INTEGER noise. TOOLS.md documents the surface, what is deliberately not exposed, the lineage (Aron's fs-tools + pi conventions + Claude Code naming), and the four-bar test for adding a fifth tool.
The decisions live in the code comments and pi alignment speaks for itself; the doc was already drifting risk.
Follow-up to #1861 — skill + hardening batch. Skills are already seeded to the prod R2 bucket and the worker/image changes are deployed (runtime behavior is live; this PR records it in-repo).
reproduce — pushes the finished repro project as an orphan
repro/issue-<n>branch on the target repo (the checkout IS the runnable project; re-runs force-push the same branch). Issue report + structured result carryreproBranchUrlso other agents can pull exactly what was built.open-pr — every fix PR ships with a temporary deployment demoing the change: build +
npm packthe fixed package, install the tarball into a minimal Vite demo (or the repro-branch project → broken-before/fixed-after on the same UI),wrangler deploy --temporary, demo URL + claim link in the PR body. Examples/docs touched by the change get updated in the same PR; no-runtime-surface changes skip the demo with a stated reason.Incident hardening (RCA in-session; see also #1870): a full-monorepo
pnpm installstreamed unbounded output through the DO → isolate OOM → the persisted backlog burned the CPU limit on every wake, permanently bricking the session./tmpis container-local, never synced).pnpmbaked into the container image (dropscorepack enable).ThinkAgent.resetSession()+AgentThink.resetSession(session)RPC — operator escape hatch: releases the container assignment, wipes all durable session state, aborts the isolate. RPC-only, not exposed over HTTP.🤖 Generated with Claude Code