diff --git a/pstack/.cursor-plugin/plugin.json b/pstack/.cursor-plugin/plugin.json index 449814e..b250bdf 100644 --- a/pstack/.cursor-plugin/plugin.json +++ b/pstack/.cursor-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "pstack", "displayName": "pstack", - "version": "0.7.0", + "version": "0.8.0", "description": "if you want to go fast, go deep first. pstack helps you write less, but higher quality code. rigorous agent workflows you can parallelize with confidence.", "author": { "name": "Lauren Tan" diff --git a/pstack/README.md b/pstack/README.md index b134691..2f1f9ac 100644 --- a/pstack/README.md +++ b/pstack/README.md @@ -24,6 +24,8 @@ fork it. improve it. make it yours. PRs are welcome! type `/automate-me`. it mines your recent transcripts, drafts a `-mode` skill from how you've actually worked, and routes through pstack underneath. you keep pstack as the base and end up with your own routing skill alongside `poteto-mode`. +models are configurable too. type `/setup-pstack`. it detects the models you have access to and writes a small always-applied rule mapping each role (code, judgment, the review panels) to a model. every skill reads it and falls back to sensible defaults when the rule is absent, so you override only what you want. + ## usage use `/poteto-mode` at the start of a task. it reads your request, picks from a set of playbooks, and runs the other skills as the steps need them. @@ -73,6 +75,7 @@ the rest are useful when you want to specifically invoke them: | `/arena` | you want N parallel attempts at the same thing, then to grab the best parts of each. | | `/interrogate` | you have a diff and want four different models to try to break it, including a strict code-quality lens. | | `/automate-me` | you want your own `-mode` skill, drafted from how you've actually worked. | +| `/setup-pstack` | you want to pick which models pstack uses per role. detects your models and writes a config rule. | | `/reflect` | a long task landed and you want the recipe captured as a skill edit. | | `/tdd` | you're fixing a bug and there's a cheap local test path. write the failing test first, then the fix. | | `/typescript-best-practices` | you're reading or editing typescript. grounds the type-system-discipline principle in syntax. | diff --git a/pstack/skills/architect/SKILL.md b/pstack/skills/architect/SKILL.md index 6c3b4db..bd23951 100644 --- a/pstack/skills/architect/SKILL.md +++ b/pstack/skills/architect/SKILL.md @@ -30,7 +30,7 @@ Skip Phase A only when the work is genuinely greenfield with no surrounding syst Run the **arena** skill with the design-sketch task and the Phase A grounding artifacts. Pass `references/runner-prompt.md` as each runner's prompt. Each candidate produces a design package shaped per `references/rationale-template.md`: the caller's usage written first, then the type sketch, function signatures, module map, and prose rationale derived from it. -Use these runner slugs: `claude-opus-4-8-thinking-xhigh`, `gpt-5.3-codex-high-fast`, `gpt-5.5-high-fast`, and `composer-2.5-fast`. +Use your configured architect runners (defaults `claude-opus-4-8-thinking-xhigh`, `gpt-5.5-high-fast`, `composer-2.5-fast`). This is the **exhaust-the-design-space** principle skill made concrete. Whole-shape alternatives, not point fixes inside one shape. diff --git a/pstack/skills/architect/references/runner-prompt.md b/pstack/skills/architect/references/runner-prompt.md index 342bb87..75593cf 100644 --- a/pstack/skills/architect/references/runner-prompt.md +++ b/pstack/skills/architect/references/runner-prompt.md @@ -16,4 +16,4 @@ Apply the following discipline. The orchestrator compares candidates on these ax - Idempotent state transitions where applicable, per the **make-operations-idempotent** principle skill. Ask what happens if the operation runs twice or crashes halfway. - Short call chains. If tracing the flow needs more than three files, flatten the hierarchy, per the **laziness-protocol** and **minimize-reader-load** principle skills. -You are one of four runners on different models. Produce the best design your model can make; don't hedge against the others. Differences between candidates are the signal used to pick a base and graft. Converging on a safe-looking middle defeats the exploration. +You are one of several runners, each on a different model. Produce the best design your model can make; don't hedge against the others. Differences between candidates are the signal used to pick a base and graft. Converging on a safe-looking middle defeats the exploration. diff --git a/pstack/skills/arena/SKILL.md b/pstack/skills/arena/SKILL.md index 8b3da49..4e59d39 100644 --- a/pstack/skills/arena/SKILL.md +++ b/pstack/skills/arena/SKILL.md @@ -25,7 +25,7 @@ The N candidates will receive the same prompt, so the prompt is the contract. Ge 1. State the artifact each candidate is producing. 2. Derive the rubric. State what success looks like for *this* task, then turn it into 3-6 concrete gradeable criteria. Concrete: `Adds a --dry-run flag that skips writes`. Vague: `code is correct`. The rubric is the picker's tool in Phase D; candidates only see the task. -3. Pick the runners. Default 4: `claude-opus-4-8-thinking-xhigh`, `gpt-5.3-codex-high-fast`, `gpt-5.5-high-fast`, and `composer-2.5-fast`. Spawn more when the arena covers multiple design directions. Same model N times when the work is generation-bound rather than judgment-sensitive. +3. Pick the runners. Default runners are your configured arena list (defaults `claude-opus-4-8-thinking-xhigh`, `gpt-5.5-high-fast`, `composer-2.5-fast`). Spawn more when the arena covers multiple design directions. Same model N times when the work is generation-bound rather than judgment-sensitive. 4. Assign output paths. Each candidate writes to its own location (a git worktree where possible, otherwise `/tmp/arena-/candidate-/`). N candidates writing to the same path is shared mutable state and fails the the **separate-before-serializing-shared-state** principle skill test. ## Phase B: Fan out diff --git a/pstack/skills/how/SKILL.md b/pstack/skills/how/SKILL.md index e2657e1..7f89788 100644 --- a/pstack/skills/how/SKILL.md +++ b/pstack/skills/how/SKILL.md @@ -45,7 +45,7 @@ The right decomposition depends on the question. Use your judgment. Narrow quest Spawn all explorers in a single message: - `subagent_type`: `generalPurpose` -- `model`: `composer-2.5-fast` +- `model`: your configured how-explorer model (default `composer-2.5-fast`) - `readonly`: `true` Each explorer gets the same base prompt from `references/explorer-prompt.md` plus a specific exploration angle naming its slice. Each explorer should: @@ -64,7 +64,7 @@ Then proceed to Step 3. Spawn a single Task subagent that explores and explains in one pass: - `subagent_type`: `generalPurpose` -- `model`: `claude-opus-4-8-thinking-xhigh` +- `model`: your configured how-explainer model (default `claude-opus-4-8-thinking-xhigh`) - `readonly`: `true` The agent does its own exploration (Glob, Grep, Read) and writes the explanation directly. Read `references/explainer-prompt.md` for the communication style and output format. Same structure, just no explorer findings as input. @@ -76,7 +76,7 @@ Proceed to Step 4. Once all explorers return, spawn a single Task subagent to synthesize their findings into one coherent explanation: - `subagent_type`: `generalPurpose` -- `model`: `claude-opus-4-8-thinking-xhigh` +- `model`: your configured how-explainer model (default `claude-opus-4-8-thinking-xhigh`) - `readonly`: `true` The explainer gets all explorers' findings and writes the human-facing explanation (output format below). Read `references/explainer-prompt.md` for the full prompt template. The explainer reconciles overlapping findings, resolves contradictions, and weaves the slices into a unified picture. @@ -109,17 +109,11 @@ Run the full explain flow above (Steps 1-4). You must understand the architectur ### Step 2. Spawn Critics -After the explanation is complete, spawn architectural critics. Launch all in a single message: - -| Subagent | Model | -|----------|-------| -| Critic A | `claude-opus-4-8-thinking-xhigh` | -| Critic B | `gpt-5.3-codex-high-fast` | -| Critic C | `gpt-5.5-high-fast` | +After the explanation is complete, spawn one architectural critic per model in your configured how-critics list (defaults `claude-opus-4-8-thinking-xhigh`, `gpt-5.5-high-fast`, `composer-2.5-fast`), all in a single message. For each critic: - `subagent_type`: `generalPurpose` -- `model`: the model from the table. These are minimum reasoning levels. The lead should escalate any model when the architecture warrants deeper analysis. +- `model`: one model from the configured how-critics list. These are minimum reasoning levels. The lead should escalate any model when the architecture warrants deeper analysis. - `readonly`: `true` Read `references/critic-prompt.md` for the prompt template. Each critic gets: diff --git a/pstack/skills/interrogate/SKILL.md b/pstack/skills/interrogate/SKILL.md index 53c0be7..071e43f 100644 --- a/pstack/skills/interrogate/SKILL.md +++ b/pstack/skills/interrogate/SKILL.md @@ -1,12 +1,12 @@ --- name: interrogate -description: "Use for \"interrogate\", \"adversarial review\", \"multi-model review\", \"challenge this\", \"stress test this code\", \"find blind spots\", or \"tear this apart\". Four LLM reviewers challenge changes from independent angles." +description: "Use for \"interrogate\", \"adversarial review\", \"multi-model review\", \"challenge this\", \"stress test this code\", \"find blind spots\", or \"tear this apart\". Multiple LLM reviewers challenge changes from independent angles." disable-model-invocation: true --- # Interrogate -Spawn four reviewers on four different models to adversarially review code changes. Each model gets the same prompt and rubric. The adversarial signal comes from model diversity, not assigned personas. Models differ in blind spots, priors, and reasoning patterns. Agreement across models is high-confidence signal; lone-model findings are worth reading but lower confidence. +Spawn one reviewer per configured model to adversarially review code changes. Each model gets the same prompt and rubric. The adversarial signal comes from model diversity, not assigned personas. Models differ in blind spots, priors, and reasoning patterns. Agreement across models is high-confidence signal; lone-model findings are worth reading but lower confidence. The deliverable is a synthesized verdict. Do NOT auto-apply changes. @@ -33,21 +33,14 @@ Write one clear paragraph. Reviewers challenge whether the work achieves the int ## Step 3, Spawn Reviewers -Launch all four in a single message using the Task tool, each with a different model. - -| Subagent | Model | -|----------|-------| -| Reviewer A | `claude-opus-4-8-thinking-xhigh` | -| Reviewer B | `gpt-5.3-codex-high-fast` | -| Reviewer C | `gpt-5.5-high-fast` | -| Reviewer D | `composer-2.5-fast` | +Launch one reviewer per model in your configured interrogate list (defaults `claude-opus-4-8-thinking-xhigh`, `gpt-5.5-high-fast`, `composer-2.5-fast`), all in a single message. For each reviewer: - `subagent_type`: `generalPurpose` -- `model`: the model from the table +- `model`: one model from the configured interrogate list - `readonly`: `true` -If a model slug in the table is rejected as unresolvable when you try to spawn the subagent, check the valid slugs in the Task tool's error message, pick the closest equivalent (prefer the highest-reasoning tier of the same family), spawn with the valid slug, and open a separate PR to update this table. Do not block the review on the slug issue. +If a configured model slug is rejected as unresolvable when you try to spawn the subagent, check the valid slugs in the Task tool's error message, pick the closest equivalent (prefer the highest-reasoning tier of the same family), spawn with the valid slug, and open a separate PR to update the configured defaults. Do not block the review on the slug issue. Read `references/reviewer-prompt.md` and fill in the template with: 1. The stated intent @@ -55,7 +48,7 @@ Read `references/reviewer-prompt.md` and fill in the template with: 3. The review rubric from `references/rubric.md` 4. The code-quality lens from `references/code-quality-review.md` -The same filled template goes to all four reviewers, so every model applies the code-quality lens. +The same filled template goes to all reviewers, so every model applies the code-quality lens. Each reviewer produces structured findings as described in the prompt template. @@ -63,7 +56,7 @@ Each reviewer produces structured findings as described in the prompt template. As results come back, build a unified picture: -1. **Parse all findings** from the four reviewers +1. **Parse all findings** from the reviewers 2. **Identify consensus**. Findings raised by 2+ models independently are highest signal. 3. **Identify lone-model findings**. Still worth reading, but weight accordingly. 4. **Deduplicate**. Different models may describe the same issue differently. Merge these and note which models raised it. @@ -75,7 +68,7 @@ You are the lead reviewer, a pragmatic senior engineer, not a neutral aggregator Read `references/lead-judgment.md` for the full framework. Reviewers only see a slice of the codebase. You have the full context (the goal, the constraints, the timeline, which tradeoffs were already considered). Use that context aggressively. -Categorize every finding into one of four buckets: +Categorize every finding using these buckets: - **Act on**. Real issues affecting correctness, security, or maintainability given the actual goals. These would block a real PR. - **Consider**. Legitimate points, but you're not sure they outweigh the cost of addressing them right now. Worth the user's attention. @@ -95,10 +88,7 @@ Present the verdict in this structure: > [The stated intent paragraph from Step 2] ### Reviewers -- Model A: [model name], [N findings] -- Model B: [model name], [N findings] -- Model C: [model name], [N findings] -- Model D: [model name], [N findings] +List each reviewer on its own line like `- : [N findings]` ### Act On [Findings that should be addressed. For each: description, which models raised it, why it matters.] diff --git a/pstack/skills/interrogate/references/lead-judgment.md b/pstack/skills/interrogate/references/lead-judgment.md index d03f08e..9977511 100644 --- a/pstack/skills/interrogate/references/lead-judgment.md +++ b/pstack/skills/interrogate/references/lead-judgment.md @@ -1,6 +1,6 @@ # Lead Judgment Framework -You are the lead reviewer. The four model reviewers have produced their findings. Apply pragmatic engineering judgment. Don't aggregate; filter, contextualize, and decide. +You are the lead reviewer. The model reviewers have produced their findings. Apply pragmatic engineering judgment. Don't aggregate; filter, contextualize, and decide. ## Why This Step Matters diff --git a/pstack/skills/poteto-mode/SKILL.md b/pstack/skills/poteto-mode/SKILL.md index 4f70a5f..8562c1c 100644 --- a/pstack/skills/poteto-mode/SKILL.md +++ b/pstack/skills/poteto-mode/SKILL.md @@ -21,7 +21,7 @@ Remaining triggers: - Before commit → the `deslop` skill from the `cursor-team-kit` plugin (`/deslop`). - Shipping UI / IDE / CLI → the matching control skill. `cursor-team-kit` publishes `control-cli` (CLIs and TUIs) and `control-ui` (browser / Electron / web UIs). For bug fixes, reproduce first on the same surface yourself; hand to the user only under the narrow Bug fix step 1 exception. - After opening a PR → Cursor's built-in **babysit** skill. -- Bugbot or the agentic security reviewer commented → skeptical posture. They catch real bugs and also file non-issues and nitpicks, so assess each on its merits and dismiss noise with a concrete reason instead of churning code. Triage fix / dismiss / ask via the built-in **babysit** skill. +- Bugbot or the agentic security review commented → skeptical posture. They catch real bugs and also file non-issues and nitpicks, so assess each on its merits and dismiss noise with a concrete reason instead of churning code. Triage fix / dismiss / ask via the built-in **babysit** skill. - Broken skill mid-task → fix it in its own PR. Don't block. Don't silently work around it. - Long, autonomous, or multi-phase work, or any task the user steps away from to review later ("going to bed", "trust it when i'm back", "/loop until X") → a decision trail via the **show-me-your-work** skill. Commit it when stakes need an auditable record; keep it local otherwise. @@ -77,7 +77,7 @@ Read the leaf skill in full for any principle you apply. Each entry names when i **Use `subagent_type: "poteto-agent"` for any subagent you spawn inside a playbook step** (code-writing delegates, ad-hoc helpers). `/poteto-mode` and `poteto-agent` route through the same wrapper. Routed workflow skills (`how`, `why`, `interrogate`, `reflect`) set their own `subagent_type` for diverse-model review; respect what the skill prescribes, don't override to `poteto-agent`. -**Defaults for every `Task` call.** `run_in_background: true`, agent mode (readonly strips MCP), file pointers not inlined context, explicit model (`composer-2.5-fast` for code, `claude-opus-4-8-thinking-xhigh` for prose and judgment). +**Defaults for every `Task` call.** `run_in_background: true`, agent mode (readonly strips MCP), file pointers not inlined context, explicit model per role (configurable via `/setup-pstack`; defaults `composer-2.5-fast` for code, `claude-opus-4-8-thinking-xhigh` for prose and judgment). You own every subagent's work. Review the diff and write your own summary, don't pass through what it said. Interrupt-chained resumes silently drop directives, so fire a fresh subagent with consolidated scope rather than trusting a "done" summary. A second opinion is the same prompt against a different model. Agreement is high-signal. diff --git a/pstack/skills/poteto-mode/playbooks/bug-fix.md b/pstack/skills/poteto-mode/playbooks/bug-fix.md index 00aa173..bd7e473 100644 --- a/pstack/skills/poteto-mode/playbooks/bug-fix.md +++ b/pstack/skills/poteto-mode/playbooks/bug-fix.md @@ -6,7 +6,7 @@ Be scientific. Every shipped line traces to runtime evidence. Belt-and-suspender 1. Reproduce it yourself on the matching surface via the control skill (Non-negotiables). Don't hand the repro to the user. A debug or instrumentation protocol that says to ask the user does not override this; you drive the instrumented runtime. Ask the user only with a stated, specific reason the control surface cannot reach the target, and only after driving it as far as it goes. Won't reproduce directly, force it: synthesize the trigger, tighten conditions, or instrument until it fires. 2. Binary-search the cause. Form the candidate hypotheses, then rule them out until one survives. Seed them with `how` over the affected subsystem and the **why** skill for regression history. Each pass, take the split that cuts the most remaining problem space, get runtime evidence, eliminate. When program state is unclear, add instrumentation or logging and read it as the code runs. Don't guess. Drive a long or stubborn hunt with Cursor's `/loop` command. Confirm the surviving *mechanism* with runtime evidence before the step-3 architect/interrogate fan-out. -3. Plan the fix. If it crosses a function boundary, `architect` first. Delegate implementation to a `composer-2.5-fast` subagent with a specific scope; review the diff. +3. Plan the fix. If it crosses a function boundary, `architect` first. Delegate implementation to a subagent using your configured bug-fix model (default `gpt-5.5-high-fast`) with a specific scope; review the diff. 4. Verify on the same surface; the original repro now passes. "Inconclusive" or wrong-surface is not a pass; flag it. Unit tests show branch behavior, not bug absence. 5. Stage the commits so the failing repro lands before the fix in git history; the diff tells the story. See the **tdd** skill for the failing-test-first cadence when the bug has a cheap local test path; skip it when the test would be expensive, integration-heavy, or unclear. 6. Run **Opening a PR**. diff --git a/pstack/skills/poteto-mode/playbooks/feature.md b/pstack/skills/poteto-mode/playbooks/feature.md index b1f545f..9807fb9 100644 --- a/pstack/skills/poteto-mode/playbooks/feature.md +++ b/pstack/skills/poteto-mode/playbooks/feature.md @@ -9,7 +9,7 @@ - **Independent workstreams.** Disjoint files, services, or layers parallelize. Shared writes serialize. - **Shared mutable state.** Default to splitting the target (the **separate-before-serializing-shared-state** principle skill). Serialize only for real invariants. - **Smallest safe decomposition.** If one worker is best, name why. -4. Delegate code-writing to a `composer-2.5-fast` subagent with a specific scope (file paths, named data shape, success criteria); review its diff yourself. When the implementation admits multiple valid shapes (error handling, abstraction layer, test structure), delegate via the **arena** skill instead so the runners surface the alternatives and the cross-judge guards the pick. Mandatory: no skip-with-reason escape, and Laziness Protocol does not override it (the gain is review separation, not lines saved). You can spawn a subagent even though you are one; "the app is small" and "a subagent cannot spawn one" are both wrong. A subagent forbidden to spawn satisfies this by owning the diff directly with the same review separation; no "standing by" reply that waits on a nested agent. Comments per **Comments**. Surgical edits, re-ground against the source for upstream-derived files. Port shared-primitive improvements to all consumers and verify each. Commit liberally. +4. Delegate code-writing to a subagent using your configured feature model (default `composer-2.5-fast`) with a specific scope (file paths, named data shape, success criteria); review its diff yourself. When the implementation admits multiple valid shapes (error handling, abstraction layer, test structure), delegate via the **arena** skill instead so the runners surface the alternatives and the cross-judge guards the pick. Mandatory: no skip-with-reason escape, and Laziness Protocol does not override it (the gain is review separation, not lines saved). You can spawn a subagent even though you are one; "the app is small" and "a subagent cannot spawn one" are both wrong. A subagent forbidden to spawn satisfies this by owning the diff directly with the same review separation; no "standing by" reply that waits on a nested agent. Comments per **Comments**. Surgical edits, re-ground against the source for upstream-derived files. Port shared-primitive improvements to all consumers and verify each. Commit liberally. 5. Verify on the matching surface. "Inconclusive" or wrong-surface is not a pass; flag it. 6. Rebase into small, ordered commits; stack follow-ups. 7. If the design is contested, `interrogate` before shipping. diff --git a/pstack/skills/poteto-mode/playbooks/perf-issue.md b/pstack/skills/poteto-mode/playbooks/perf-issue.md index 2ca9351..e38cd62 100644 --- a/pstack/skills/poteto-mode/playbooks/perf-issue.md +++ b/pstack/skills/poteto-mode/playbooks/perf-issue.md @@ -4,7 +4,7 @@ 1. Capture a baseline trace via the matching control skill. 2. `how` to ground hypotheses; don't claim a perf ceiling without running it first. -3. Plan the fix from the trace. If it crosses a function boundary, `architect` first. Delegate implementation to a `composer-2.5-fast` subagent; review the diff. Capture a post-fix trace. +3. Plan the fix from the trace. If it crosses a function boundary, `architect` first. Delegate implementation to a subagent using your configured perf-issue model (default `gpt-5.5-high-fast`); review the diff. Capture a post-fix trace. 4. Parse and compare the artifacts (JSON to sqlite, diff). "Inconclusive" or wrong-surface is not a pass; flag it. 5. Cite the measurement in the PR. 6. Run **Opening a PR**. diff --git a/pstack/skills/poteto-mode/playbooks/refactoring.md b/pstack/skills/poteto-mode/playbooks/refactoring.md index 344240c..a191d0b 100644 --- a/pstack/skills/poteto-mode/playbooks/refactoring.md +++ b/pstack/skills/poteto-mode/playbooks/refactoring.md @@ -7,7 +7,7 @@ If the cleanup reveals a missing feature or a real bug, split it out and ship th 1. Pin the behavior contract first. Run the **how** skill over the affected subsystem to learn the contract, then write a characterization test, snapshot, or equivalence harness that captures current behavior before any structure moves. The harness makes "refactor" a checkable claim (**principle-prove-it-works**). If the area has no coverage, write the pin before touching structure. Type check and lint are not a pin. 2. Name the target shape. State what the module layout, types, and call graph should be if built today (**principle-foundational-thinking**, **principle-redesign-from-first-principles**). If the target crosses a function boundary, run the **architect** skill for parallel design exploration of the shape before the move. 3. Subtract before you add. Delete dead weight, collapse one-caller wrappers, drop redundant validators, and remove orphan references before introducing the new shape (**principle-subtract-before-you-add**). The smallest change that reaches the target shape ships (**principle-laziness-protocol**). A speculative cleanup that "might help" gets reverted, not left to ride. -4. Move in small behavior-preserving steps, each keeping the pin green. For API reshapes, migrate every caller and delete the old API in the same wave (**principle-migrate-callers-then-delete-legacy-apis**). No compatibility shims, no parallel old-and-new paths. Spot-check every rename against the actual files; renames silently miss usages in strings, prose, and back-references. Delegate the mechanical edits to a `composer-2.5-fast` subagent with a specific scope (file paths, the names being moved, the behavior to hold); review the diff yourself. +4. Move in small behavior-preserving steps, each keeping the pin green. For API reshapes, migrate every caller and delete the old API in the same wave (**principle-migrate-callers-then-delete-legacy-apis**). No compatibility shims, no parallel old-and-new paths. Spot-check every rename against the actual files; renames silently miss usages in strings, prose, and back-references. Delegate the mechanical edits to a subagent using your configured refactoring model (default `composer-2.5-fast`) with a specific scope (file paths, the names being moved, the behavior to hold); review the diff yourself. 5. Prove behavior is unchanged on the real artifact, not "it compiles" (**principle-prove-it-works**). For larger reshapes, run an equivalence check: a script that diffs old-vs-new outputs, a recorded baseline replayed against the new code, or a smoke run on the matching surface via the relevant control skill. Own the verification yourself; do not trust a delegate's "looks good" summary. 6. Confirm the change earns its place. The success measure is reduced reader load (**principle-minimize-reader-load**): fewer layers between question and answer, less hidden state, fewer indirections without a second consumer. If the diff does not lower reader load somewhere, revert it. 7. Rebase into small ordered commits that tell the story. A subtraction commit, then the reshape, then any follow-on cleanup, so a single revert undoes one slice. Run **Opening a PR**. diff --git a/pstack/skills/poteto-mode/references/plan.md b/pstack/skills/poteto-mode/references/plan.md index 15c231d..37012f4 100644 --- a/pstack/skills/poteto-mode/references/plan.md +++ b/pstack/skills/poteto-mode/references/plan.md @@ -25,7 +25,7 @@ Resolve what is in scope vs explicitly out, technical or platform constraints, p Delegate codebase exploration (the **guard-the-context-window** principle skill). - Prefer `subagent_type: "poteto-agent"`. `generalPurpose` is the fallback. Never use the built-in `plan` subagent_type; it ignores this skill. -- Pass `model:` explicitly. `composer-2.5-fast` for code reads, `claude-opus-4-8-thinking-xhigh` for judgment. +- Pass `model:` explicitly per the configured roles (defaults `composer-2.5-fast` for code, `claude-opus-4-8-thinking-xhigh` for judgment). Each explorer returns file pointers, conventions, dependencies, test infrastructure, and entry points. No inlined dumps. diff --git a/pstack/skills/reflect/SKILL.md b/pstack/skills/reflect/SKILL.md index 2670372..8decf61 100644 --- a/pstack/skills/reflect/SKILL.md +++ b/pstack/skills/reflect/SKILL.md @@ -38,15 +38,15 @@ One message, three `Task` calls, `subagent_type: generalPurpose`, explicit `mode | Lens | `model` | Prompt template | |---|---|---| -| Judgment | `claude-opus-4-8-thinking-xhigh` | `references/judgment-reviewer.md` | -| Tooling | `composer-2.5-fast` | `references/tooling-reviewer.md` | -| Divergent | `claude-opus-4-8-thinking-xhigh` | `references/divergent-reviewer.md` | +| Judgment | your configured reflect-judgment model (default `claude-opus-4-8-thinking-xhigh`) | `references/judgment-reviewer.md` | +| Tooling | your configured reflect-tooling model (default `composer-2.5-fast`) | `references/tooling-reviewer.md` | +| Divergent | your configured reflect-judgment model (default `claude-opus-4-8-thinking-xhigh`) | `references/divergent-reviewer.md` | Pass each template verbatim, substituting the transcript path or digest where marked. Reviewers return findings in the `Task` response body. ### 3. Synthesize -One `Task` call, `subagent_type: generalPurpose`, `model: claude-opus-4-8-thinking-xhigh`, agent mode (`readonly: false`). The synthesizer's quality check includes spot-verifying citations, which can require MCP access; readonly strips MCPs. Use `references/synthesizer.md` verbatim, with each reviewer's full output inlined where marked. The synthesizer returns a structured Accepted / Rejected / Backlog list. +One `Task` call, `subagent_type: generalPurpose`, using your configured reflect-judgment model (default `claude-opus-4-8-thinking-xhigh`), agent mode (`readonly: false`). The synthesizer's quality check includes spot-verifying citations, which can require MCP access; readonly strips MCPs. Use `references/synthesizer.md` verbatim, with each reviewer's full output inlined where marked. The synthesizer returns a structured Accepted / Rejected / Backlog list. ### 4. Structural enforcement check diff --git a/pstack/skills/setup-pstack/SKILL.md b/pstack/skills/setup-pstack/SKILL.md new file mode 100644 index 0000000..cc1de6d --- /dev/null +++ b/pstack/skills/setup-pstack/SKILL.md @@ -0,0 +1,55 @@ +--- +name: setup-pstack +description: Configure which models pstack uses per role. Detects your available models and writes an always-applied rule that overrides the skill defaults. Use for /setup-pstack, "configure pstack models", or changing pstack's model choices. +--- + +# Setup pstack + +Write `~/.cursor/rules/pstack-models.mdc`, an always-applied rule that sets pstack's model per role. The skills read it and fall back to their inline defaults when a line is absent, so this is an override layer, not a requirement. + +## Steps + +### 1. Detect available models + +Enumerate the model slugs you can pass to a `Task` subagent in this session; that is the dependable source. If Cursor also exposes a models API or CLI that lists the user's entitled models, prefer it for completeness. If you cannot detect any, ask the user to paste the slugs they have access to. Never write a slug you have not confirmed is available. + +### 2. Load current state + +The default role-to-model mapping is the rule shape shown in step 5 below. If `~/.cursor/rules/pstack-models.mdc` already exists, read it and treat its values as the current choices. Otherwise start from those defaults. + +### 3. Map and confirm + +Show every role with its current model, marking any whose model is not in the detected set as needing a choice. Ask whether to accept as-is or change specific roles, offering the detected models as the options. Prefer AskQuestion over free text. For panel roles (how critics, arena runners, architect runners, interrogate reviewers) the value is a list, and one subagent runs per model, so the list length sets the count. + +### 4. Validate + +Every slug written must be in the detected set. If a chosen slug is not available, stop and ask again. A rule pointing at a model the user cannot use breaks every delegation that reads it. + +### 5. Write the rule + +Write `~/.cursor/rules/pstack-models.mdc` with `alwaysApply: true` and one line per role, using the same labels poteto-mode uses. Overwrite the whole file so re-runs stay idempotent. Shape: + +``` +--- +description: pstack per-role model choices (overrides skill defaults) +alwaysApply: true +--- +# pstack model configuration. One line per role. Delete a line to fall back to the skill default. +feature, refactoring: composer-2.5-fast +bug-fix, perf-issue: gpt-5.5-high-fast +judgment and prose: claude-opus-4-8-thinking-xhigh +how explorer: composer-2.5-fast +how explainer: claude-opus-4-8-thinking-xhigh +how critics: claude-opus-4-8-thinking-xhigh, gpt-5.5-high-fast, composer-2.5-fast +why investigators: composer-2.5-fast +why synthesizer: claude-opus-4-8-thinking-xhigh +reflect tooling: composer-2.5-fast +reflect judgment, divergent, synthesizer: claude-opus-4-8-thinking-xhigh +arena runners: claude-opus-4-8-thinking-xhigh, gpt-5.5-high-fast, composer-2.5-fast +architect runners: claude-opus-4-8-thinking-xhigh, gpt-5.5-high-fast, composer-2.5-fast +interrogate reviewers: claude-opus-4-8-thinking-xhigh, gpt-5.5-high-fast, composer-2.5-fast +``` + +### 6. Confirm + +Tell the user the rule was written and that it applies to new sessions. Re-running this skill updates it. diff --git a/pstack/skills/why/SKILL.md b/pstack/skills/why/SKILL.md index 16adcae..def1710 100644 --- a/pstack/skills/why/SKILL.md +++ b/pstack/skills/why/SKILL.md @@ -117,7 +117,7 @@ Launch all matching investigators in a single message so they run concurrently. Subagent config (each): - `subagent_type`: `generalPurpose` -- `model`: `composer-2.5-fast` +- `model`: your configured why-investigators model (default `composer-2.5-fast`) - `readonly`: `false` (agent mode). **Do not use readonly/Ask mode.** It strips MCP access, which disables MCP-backed investigators entirely. The source control investigator would be safe in readonly, but keep modes uniform. Investigators still shouldn't write anything. That's a posture, not a sandbox. Each investigator gets: @@ -163,7 +163,7 @@ If your scope assessment suggests a single-commit trivial target where the PR de Spawn one synthesizer subagent: - `subagent_type`: `generalPurpose` -- `model`: `claude-opus-4-8-thinking-xhigh` +- `model`: your configured why-synthesizer model (default `claude-opus-4-8-thinking-xhigh`) - `readonly`: `false` (agent mode). The synthesizer's quality check spot-verifies citations, which can require MCP access. Readonly/Ask mode strips MCPs and defeats that. The synthesizer gets: