HeavySkill — Divergent-Think for Claude Code

Cross-framing heavy reasoning for Claude Code — fixes the two failure modes (Appendix A "diversity collapse", Appendix B "iteration drift") documented in the HeavySkill paper. Ships as five composable skills with a single user-facing entry: /divergent-think.

Why this exists

The HeavySkill paper showed that parallel-reasoning + sequential-deliberation outperforms majority-voting / Best-of-N — but it also documented two limitations:

Paper finding	Limitation	Fix this repo ships
Appendix A	K parallel trajectories share the same prompt → Max-Diversity selection ≈ Random selection	`auto-reframe` generates K=6 axis-disjoint framings so each trajectory enters the problem through a structurally different conceptual lens
Appendix B	Iterative deliberation: HM@K rises but HP@K falls (in-frame noise accumulates)	`frame-critic` injects a fresh axis between iterations; sees only framings + summaries (never trajectory bodies), enforced by sub-agent context isolation

The divergent-think orchestrator combines both into one end-to-end pipeline; the original single-frame heavyskill is kept as a baseline for already-canonical problems (competition math, well-posed STEM).

What you get

Two complementary ways to use this repo:

Mode	For who	Entry point
A — Claude Code skill (recommended for interactive use)	Anyone using Claude Code as their primary harness	`/divergent-think <query>`
B — Python workflow (for paper repro & batch benchmarking)	Researchers running ablations against open-weight models	`python scripts/run_divergent.py ...`

Both modes implement the same pipeline shape and the same hard limits (K=6 framings, K¹=4 deliberation samples, N_max=3 critic iterations) defined in the paper.

Architecture

                 ┌──────────────────────────────────────────────┐
   User query ──▶│           divergent-think (orchestrator)     │
                 └──────────────────────────────────────────────┘
                          │
                          ▼
           ┌──────────────────────────────────┐
   Stage 1 │ Reframe in-context               │  reads prompts/01-reframe.md
           │   → 6 framings, strict JSON      │
           │   (domain / abstraction / actor /│
           │    goal / scale / analogy)       │
           └──────────────────────────────────┘
                          │
                          ▼
           ┌──────────────────────────────────┐
   Stage 2 │ Spawn 6 Agents in parallel       │  reads prompts/02-worker.md
           │   each with ONE framing          │  ─▶ 6 trajectories
           │ Deliberate in-context (×4)       │  reads prompts/03-deliberation.md
           │   → summary[iter]                │
           └──────────────────────────────────┘
                          │
                          ▼
           ┌──────────────────────────────────┐
   Stage 3 │ Spawn 1 Agent (frame-critic)     │  reads prompts/04-critic.md
           │   sees ONLY framings + summaries │  ─▶ STOP | CONTINUE
           │   (NEVER trajectory bodies)      │
           └──────────────────────────────────┘
                          │
              ┌───────────┴──────────────┐
              │                          │
       STOP   ▼                          ▼  CONTINUE (≤ N_max=3)
       Final answer                Stage 4 — Partial re-run:
                                   spawn 1 Agent for the new
                                   framing only, re-deliberate,
                                   loop to Stage 3

The five skills

Skill	Role	User-invocable?
`divergent-think`	Orchestrator — only this skill executes the pipeline	✅ Primary entry point
`heavyskill`	Single-frame baseline (K=3 parallel on the same prompt)	✅ For canonical math/STEM
`auto-reframe`	Narrative reference for Stage 1	❌ Documentation only
`heavy-think-divergent`	Narrative reference for Stage 2	❌ Documentation only
`frame-critic`	Narrative reference for Stage 3	❌ Documentation only

The four executable prompt templates

The orchestrator reads these at runtime — they are the single source of truth for prompt content:

.claude/skills/divergent-think/prompts/
├── 01-reframe.md         # Stage 1 protocol (6 axes + anti-anchoring + JSON schema)
├── 02-worker.md          # Stage 2 per-framing worker prompt
├── 03-deliberation.md    # Stage 2 cross-framing synthesis prompt
└── 04-critic.md          # Stage 3 STOP/CONTINUE prompt

Edit these to tune behavior — never edit the inlined prompt content in the SKILL.md files (there is none in v0.3.0; the orchestrator Reads these at each stage).

Quick start — Mode A (Claude Code skill)

Install the skill bundle

Download or copy the bundled tarball (dist/heavyskill-skill-v0.3.0.tar.gz) and extract it into either your project's .claude/skills/ or your user-level ~/.claude/skills/:

# Project-level install (recommended while iterating)
mkdir -p .claude/skills
tar -xzvf heavyskill-skill-v0.3.0.tar.gz -C .claude/skills/

# OR user-level install (available across all your projects)
mkdir -p ~/.claude/skills
tar -xzvf heavyskill-skill-v0.3.0.tar.gz -C ~/.claude/skills/

After extraction you should see five skill folders (divergent-think/, heavyskill/, auto-reframe/, heavy-think-divergent/, frame-critic/) and the divergent-think/prompts/ subdirectory with four .md files.

Verify the install

In a fresh Claude Code session inside the project:

/divergent-think 設計一個讓 SaaS 用戶留存率上升的功能（不能讓 DAU 下降）

You should observe:

Stage 1's 6-framing JSON emitted inline by the orchestrator
Six parallel Agent tool calls in the terminal (one per axis) — this is the key visible signal that the pipeline is working
A 7th Agent call for the frame-critic
A clean Chinese-language final answer (matching query language), no meta-narration prefix

If you only see Stage 1 JSON and no parallel Agents → the orchestrator isn't reading prompts/02-worker.md. Re-extract the tarball and confirm the prompts/ directory landed correctly.

Run

/divergent-think <multi-vector reasoning query>    # full divergent pipeline (1–3 min)
/heavyskill      <canonical math/STEM query>       # single-frame baseline (faster, cheaper)

The orchestrator auto-detects when a query is canonical and may hand off to heavyskill itself.

Quick start — Mode B (Python workflow, for paper repro)

git clone https://github.com/wjn1996/HeavySkill.git
cd HeavySkill
pip install -e .

Run the divergent pipeline (matches divergent-think skill semantics with K=6, K¹=4, N_max=3 defaults):

python scripts/run_divergent.py \
    --query "Find the number of paths of length 16 on an 8x8 grid that change direction exactly four times." \
    --model "deepseek-r1" \
    --api_base "http://localhost:8080" \
    --output "outputs/divergent_result.json" \
    --verbose

Run the single-frame baseline (matches heavyskill skill):

python scripts/run_heavyskill.py \
    --query "Your problem here" \
    --model "deepseek-r1" \
    --api_base "http://localhost:8080" \
    --reason_k 8 --summary_k 4 \
    --output "outputs/baseline_result.json"

Using a separate deliberation model:

python scripts/run_heavyskill.py \
    --query "..." \
    --model         "r1-distill-qwen-7b"   --api_base         "http://localhost:8080" \
    --summary_model "qwen3-32b"            --summary_api_base "http://localhost:8081" \
    --reason_k 16 --summary_k 4

Batch evaluation:

python scripts/run_heavyskill.py \
    --input_file "examples/example_math.json" \
    --model "deepseek-r1" --api_base "http://localhost:8080" \
    --output "outputs/batch_result.json"

The Python pipeline supports any OpenAI-compatible endpoint: vLLM, DeepSeek API, Together AI, OpenRouter, local Ollama, etc.

Pipeline detail — what each stage does

Stage 1 — Reframe (in-context)

Orchestrator reads prompts/01-reframe.md and executes the protocol in its own context. Produces 6 framings, one per required axis (domain, abstraction, actor, goal, scale, analogy). Anti-anchoring guards ensure the framing names reuse ≤ 30% of the query's content tokens — this prevents the "security audit" framing of a security-audit query.

Stage 2 — Parallel framed reasoning (6 sub-agents)

Orchestrator reads prompts/02-worker.md, substitutes the 5 placeholders per framing, and dispatches 6 Agent tool calls in parallel in a single response. Each sub-agent reasons in its framing's vocabulary, then translates back to the original query. Trajectories return into the orchestrator's context.

Stage 2 — Cross-framing deliberation (in-context, K¹=4 samples)

Orchestrator reads prompts/03-deliberation.md and runs the synthesis itself (must hold all 6 trajectories at once — cannot delegate). Produces 4 samples; picks the most internally consistent. Each summary explicitly names a cross-framing combination that no single framing produced alone, or honestly reports "no genuine combination found".

Stage 3 — Frame-critic (1 sub-agent, STOP/CONTINUE)

Orchestrator reads prompts/04-critic.md and spawns ONE Agent call with {query, framings, summaries_history, iterations_done, n_max, axes_unused} — never the trajectory bodies. Sub-agent context isolation enforces the "critic outside the deliberation frame" contract. Critic returns strict JSON: STOP, or CONTINUE + new framing + axis to evict.

Stage 4 — Iteration (CONTINUE only)

Swap one (framing, trajectory) pair; spawn ONE Agent for the new framing's trajectory only; re-run Stage 2 deliberation over the updated 6-tuple; loop to Stage 3. Cost stays linear in N, not N × K.

Stage 5 — Final answer

Match the original query's language and format conventions. No "after deep thinking..." preamble. The user sees an answer that reads as if written directly in response to the query.

Cost / latency

Resource	Typical full run
Input + output tokens	~200k – 400k (varies with query complexity)
Wall clock	1–3 minutes (depends on sub-agent throughput)
Sub-agent calls	6 (Stage 2 parallel) + 1 (Stage 3 critic), + up to N_max × (1 worker + 1 critic) on CONTINUE branches

The orchestrator tells the user "Running divergent-think (~1–3 min)" before starting so they can interrupt if metered-API cost is a concern. The hard max_total_tokens budget guard (set in the Python config at 1.5M) fires an early-STOP and returns the best summary so far if exceeded.

For canonical math/STEM where framing is already given, prefer /heavyskill — it runs K=3 parallel on a single prompt (no reframe, no critic), typically 1/3 the cost.

Repository layout

HeavySkill/
├── .claude/skills/                  # Mode A: Claude Code skill bundle
│   ├── divergent-think/
│   │   ├── SKILL.md                 # Orchestrator logic (v0.3.0)
│   │   └── prompts/                 # Executable prompt templates
│   │       ├── 01-reframe.md
│   │       ├── 02-worker.md
│   │       ├── 03-deliberation.md
│   │       └── 04-critic.md
│   ├── heavyskill/SKILL.md          # Single-frame baseline
│   ├── auto-reframe/SKILL.md        # Narrative reference for Stage 1
│   ├── heavy-think-divergent/SKILL.md  # Narrative reference for Stage 2
│   └── frame-critic/SKILL.md        # Narrative reference for Stage 3
│
├── workflow/                        # Mode B: Python pipeline
│   ├── config.py                    # HeavySkillConfig dataclass
│   ├── pipeline.py                  # Single-frame orchestration
│   ├── prompts.py                   # Prompt templates (general / STEM, CN / EN)
│   ├── divergent/                   # Divergent variant
│   │   ├── pipeline.py              # Critic-driven outer loop
│   │   ├── reframer.py              # auto-reframe equivalent
│   │   ├── cross_framing_deliberation.py
│   │   ├── frame_critic.py
│   │   ├── axes.py / distance.py / metrics.py / types.py
│   │   └── config.py                # K=6, K¹=4, N_max=3 defaults
│   └── agent/openai_compatible.py   # Async OpenAI-compatible client
│
├── scripts/
│   ├── run_heavyskill.py            # Mode B single-frame CLI
│   ├── run_divergent.py             # Mode B divergent CLI
│   ├── run_benchmark.py             # Batch benchmark harness
│   ├── judge_arena.py               # Arena-Hard auto-judge
│   └── evaluate.py                  # Accuracy evaluation utility
│
├── examples/
│   ├── example_math.json
│   ├── aime_hmmt_subset.json
│   ├── arena_hard_subset.json
│   └── ctf_seed_v0.json
│
├── paper/heavyskill.pdf             # Paper (arXiv:2605.02396)
├── tests/                           # pytest suite
├── dist/                            # Skill tarball releases (gitignored)
├── pyproject.toml
└── README.md

Contributing

Issues and PRs welcome. A few conventions to make review fast:

Prompt edits belong in .claude/skills/divergent-think/prompts/*.md, not in any SKILL.md. The SKILL files describe orchestration and concepts; prompt content is in the runtime files.
New axis or framing rule changes belong in prompts/01-reframe.md and also in the corresponding section of workflow/divergent/axes.py to keep the two modes consistent.
Smoke test before sending a PR: run /divergent-think <a short multi-vector query> and confirm the terminal shows 6 parallel Agent calls + 1 critic Agent call. If it doesn't, the pipeline is broken.
Hard limits (K=6, K¹=4, N_max=3) come from the paper — change them in the config file, not by hard-coding new numbers in prompts.

Citation

If you use this work, please cite the original paper:

@article{wang2026heavyskill,
  title={HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness},
  author={Wang, Jianing and Guo, Linsen and Chen, Zhengyu and Guo, Qi and Zang, Hongyu and Shi, Wenjie and Ma, Haoxiang and Xi, Xiangyu and Li, Xiaoyu and Wang, Wei and Cai, Xunliang},
  journal={arXiv preprint arXiv:2605.02396},
  year={2026},
  url={https://arxiv.org/abs/2605.02396}
}

The Claude Code skill bundle in .claude/skills/ is an independent implementation of the paper's divergent variant (and its single-frame baseline) packaged for the Claude Code agentic harness. Bug reports against the skill bundle are tracked separately from paper errata.

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HeavySkill — Divergent-Think for Claude Code

Why this exists

What you get

Architecture

The five skills

The four executable prompt templates

Quick start — Mode A (Claude Code skill)

Install the skill bundle

Verify the install

Run

Quick start — Mode B (Python workflow, for paper repro)

Pipeline detail — what each stage does

Stage 1 — Reframe (in-context)

Stage 2 — Parallel framed reasoning (6 sub-agents)

Stage 2 — Cross-framing deliberation (in-context, K¹=4 samples)

Stage 3 — Frame-critic (1 sub-agent, STOP/CONTINUE)

Stage 4 — Iteration (CONTINUE only)

Stage 5 — Final answer

Cost / latency

Repository layout

Contributing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude/skills		.claude/skills
examples		examples
paper		paper
scripts		scripts
skill		skill
tests		tests
workflow		workflow
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
divergent-thinking-skills-plan-v0.1.md		divergent-thinking-skills-plan-v0.1.md
divergent-thinking-skills-v0.1-results.md		divergent-thinking-skills-v0.1-results.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HeavySkill — Divergent-Think for Claude Code

Why this exists

What you get

Architecture

The five skills

The four executable prompt templates

Quick start — Mode A (Claude Code skill)

Install the skill bundle

Verify the install

Run

Quick start — Mode B (Python workflow, for paper repro)

Pipeline detail — what each stage does

Stage 1 — Reframe (in-context)

Stage 2 — Parallel framed reasoning (6 sub-agents)

Stage 2 — Cross-framing deliberation (in-context, K¹=4 samples)

Stage 3 — Frame-critic (1 sub-agent, STOP/CONTINUE)

Stage 4 — Iteration (CONTINUE only)

Stage 5 — Final answer

Cost / latency

Repository layout

Contributing

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages