Skip to content

feat(agent-think): repro branches + per-PR live demo deployments in the skills#1868

Merged
mattzcarey merged 13 commits into
mainfrom
feat/reproduce-skill-repro-branch
Jul 3, 2026
Merged

feat(agent-think): repro branches + per-PR live demo deployments in the skills#1868
mattzcarey merged 13 commits into
mainfrom
feat/reproduce-skill-repro-branch

Conversation

@mattzcarey

@mattzcarey mattzcarey commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #1861 — skill + hardening batch. Skills are already seeded to the prod R2 bucket and the worker/image changes are deployed (runtime behavior is live; this PR records it in-repo).

reproduce — pushes the finished repro project as an orphan repro/issue-<n> branch on the target repo (the checkout IS the runnable project; re-runs force-push the same branch). Issue report + structured result carry reproBranchUrl so other agents can pull exactly what was built.

open-pr — every fix PR ships with a temporary deployment demoing the change: build + npm pack the fixed package, install the tarball into a minimal Vite demo (or the repro-branch project → broken-before/fixed-after on the same UI), wrangler deploy --temporary, demo URL + claim link in the PR body. Examples/docs touched by the change get updated in the same PR; no-runtime-surface changes skip the demo with a stated reason.

Incident hardening (RCA in-session; see also #1870): a full-monorepo pnpm install streamed unbounded output through the DO → isolate OOM → the persisted backlog burned the CPU limit on every wake, permanently bricking the session.

  • exec tool description + both skills mandate redirect-to-file + tail for noisy commands (/tmp is container-local, never synced).
  • pnpm baked into the container image (drops corepack enable).
  • ThinkAgent.resetSession() + AgentThink.resetSession(session) RPC — operator escape hatch: releases the container assignment, wipes all durable session state, aborts the isolate. RPC-only, not exposed over HTTP.
  • AGENTS.md edge cases updated (WS run_worker_first, deploys reset in-flight turns, unbounded exec output).

🤖 Generated with Claude Code

…branch

New step: after verification, push the repro project as an orphan
repro/issue-<n> branch on the target repo (checkout IS the runnable
project; re-runs force-push the same branch; tar-copy excludes
node_modules/dist/.wrangler/.env — the image has no rsync). The report
comment and structured result now carry reproBranchUrl so other agents
can pull exactly what was built. Best-effort: a protected-branch
rejection is reported, not fatal.
@changeset-bot

changeset-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: aa5c004

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

`reproBranchUrl`. If the push is rejected (branch protection), say so in the
report and continue — the branch is best-effort, the report is not.

## 7. Report back on the issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Step cross-reference in the instructions now points to the wrong step after renumbering

The report step is renumbered from 6 to 7 (## 7. Report back on the issue at agent-think/skills/reproduce/SKILL.md:291) but an earlier instruction still says "include the URL + click instructions in your report (step 6)" (agent-think/skills/reproduce/SKILL.md:87), so the agent following these instructions is directed to the wrong step.

Impact: The AI agent may look at the wrong step ("Push the repro to a branch") instead of the reporting step when following the cross-reference.

Stale cross-reference from step renumbering

Line 87 reads:

  1. After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).

Before this PR, step 6 was "Report back on the issue". The PR inserts a new step 6 ("Push the repro to a branch") and renumbers the old step 6 to step 7, but does not update the "(step 6)" reference at line 87 to "(step 7)".

Prompt for agents
Line 87 of agent-think/skills/reproduce/SKILL.md contains the text '(step 6)' which was a cross-reference to the 'Report back on the issue' step. Since that step was renumbered from 6 to 7 in this PR, the reference at line 87 needs to be updated from '(step 6)' to '(step 7)'. The line reads: '5. After deploy, confirm the root URL serves the page (step 5) and include the URL + click instructions in your report (step 6).' Change '(step 6)' to '(step 7)'.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eaf5997 — the reference now points at step 7.

Comment thread agent-think/skills/reproduce/SKILL.md Outdated
Comment on lines +274 to +281
git checkout --orphan repro/issue-<issueNumber>
git rm -rfq --cached . && git clean -fdq
tar -C "$REPRO_DIR" --exclude node_modules --exclude dist \
--exclude .wrangler --exclude .env -cf - . | tar -xf -
git add -A
git commit -m "repro for #<issueNumber>: <one-line issue title>"
git push -f origin repro/issue-<issueNumber>
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Destructive git operations on the repo checkout may prevent re-use

The new step 6 runs git rm -rfq --cached . && git clean -fdq on /workspace/repo (agent-think/skills/reproduce/SKILL.md:275), which wipes the working tree of the cloned repository to prepare the orphan branch. This is intentional for creating a clean orphan branch, but it means the original repo checkout is destroyed after this step. If the agent needs to reference source files from the repo for the root-cause hypothesis in step 7 (e.g., pointing at suspect file/line in packages/), the files would no longer be available. The step ordering places this after verification (step 5), so the repo has already been used for understanding the code (step 2), but the reporting step (step 7) asks for a root-cause hypothesis pointing at specific files — the agent would need to have noted those before step 6 destroys the checkout.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eaf5997 — the publish now happens in a scratch git init dir pushed by URL (repro-publish-<n>, removed afterwards), so /workspace/repo is never touched and stays available for the root-cause hypothesis and follow-up PR work.

@pkg-pr-new

pkg-pr-new Bot commented Jul 3, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1868

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1868

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1868

create-think

npm i https://pkg.pr.new/create-think@1868

hono-agents

npm i https://pkg.pr.new/hono-agents@1868

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1868

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1868

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1868

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1868

commit: aa5c004

Every fix PR now includes a temporary deployment demoing the change:
build+pack the fixed package, install the tarball into a minimal Vite
demo (or reuse the repro/issue-<n> branch project), deploy --temporary,
and put the demo URL + claim link in the PR body. Also: keep examples/
docs truthful in the same PR when the change affects them. Docs-only /
no-runtime changes skip the demo with a stated reason.
@mattzcarey mattzcarey changed the title feat(agent-think): reproduce skill publishes the repro as a pullable branch feat(agent-think): repro branches + per-PR live demo deployments in the skills Jul 3, 2026
mattzcarey added 11 commits July 3, 2026 15:06
RCA (PLANS/agents/agent-think-1845-rca.md): a full-monorepo pnpm
install streamed ~8 min of output through the DO, OOMing the isolate;
the persisted backlog then burned the 30s CPU limit on every wake, so
recovery never ran and the session death-looped — re-kicks died in
seconds against the same wall.

- exec tool description + both skills now mandate redirect-to-file +
  tail for noisy commands (installs/builds/tests); /tmp is fine for
  logs (container-local, never synced).
- pnpm baked into the image (drops the corepack step).
- ThinkAgent.resetSession() + AgentThink.resetSession(session) RPC:
  operator escape hatch that releases the container assignment, wipes
  all durable state, and aborts the isolate — the only way to unbrick
  a poisoned session today.
- AGENTS.md edge case documented.
The system prompt and skills advertise jq on the container backend,
but the image never installed it (runs hit '/bin/sh: 1: jq: not
found').
👀 (gh-app) only proves the webhook was seen; the Think turn adds 🚀
the moment it wakes (beforeTurn, direct REST with the installation
token, once per dispatch, fire-and-forget). 👀-without-🚀 now cleanly
reads as 'turn is dead'.
Per review: the MODEL adds the 🚀 via gh in the container as its first
action — proving the whole chain (model loop, container, authed gh) is
alive, not just that the turn woke. System-prompt instruction carries
the exact comment id; 👀-without-🚀 reads as 'agent dead'.
…its are dry

Workers AI billing path; the gpt-5.5 gateway-delegate plumbing stays in
place for a one-line switch back.
…ale step ref

The orphan-branch publish now happens in a scratch git-init dir pushed
by URL, so /workspace/repo stays intact for the root-cause hypothesis
and follow-up PR work. Fix the '(step 6)' cross-reference left stale by
renumbering.
Nobody needs the claim link — reports/PRs now say
'Repro/Demo URL (expires after 60 mins): <url>' and the structured
results lose claimUrl/demoClaimUrl.
- AGENTS.md described the replaced beforeTurn rocket hook; it now
  documents the shipped model-driven mechanism (and what 🚀 proves).
- Step cross-references in both skills are by NAME, not number — the
  numeric ones already broke once this PR when a step was inserted.
- resetSession's abort delay is a named constant with its rationale.
….md)

The model now sees exactly read/write/edit/bash, aligned with pi and
Claude Code (far more training data on 'bash' than a bespoke 'exec').
Think's workspace built-ins (list/find/grep/delete) are excluded from
the provider request via beforeTurn activeTools (~600 prompt tokens
reclaimed per call); ls/grep/rm/find happen through bash.

bash (renamed from exec, src/tools/bash.ts) also gains the pi-inspired
upgrades: optional timeout (SIGTERM + exit 124, GNU convention),
head+tail truncation per stream (the old head-only cut threw away the
tail where build errors live), per-backend guidance folded into the
backend param's schema, and a slimmed description. read's offset/limit
gain sane maxima so zod stops serializing MAX_SAFE_INTEGER noise.

TOOLS.md documents the surface, what is deliberately not exposed, the
lineage (Aron's fs-tools + pi conventions + Claude Code naming), and
the four-bar test for adding a fifth tool.
The decisions live in the code comments and pi alignment speaks for
itself; the doc was already drifting risk.
@mattzcarey mattzcarey merged commit 5ee3bf5 into main Jul 3, 2026
4 checks passed
@mattzcarey mattzcarey deleted the feat/reproduce-skill-repro-branch branch July 3, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant