Maximum reasoning + multi-agent orchestration for any task.
"Don't think harder. Think deeper, in parallel, and verify."
Quick Start Β· How It Works Β· Comparison Β· Patterns Β· Quality Gate
AI coding assistants often:
- Rush to a solution without thinking deeply
- Handle complex tasks sequentially when parts could run in parallel
- Skip verification and deliver unchecked output
- Use the same approach for a typo fix and an architecture migration
Deep Work fixes all four. It's a meta-orchestrator skill that automatically classifies task complexity, scales agent count to match, parallelizes independent work, enforces deep reasoning, and quality-gates every deliverable with a fresh-eyes critic.
# Option 1: Clone the repo
git clone https://github.com/Harsh9005/deep-work.git
cp deep-work/SKILL.md ~/.claude/skills/deep-work/SKILL.md
# Option 2: Direct download
mkdir -p ~/.claude/skills/deep-work
curl -o ~/.claude/skills/deep-work/SKILL.md \
https://raw.githubusercontent.com/Harsh9005/deep-work/main/SKILL.mdIn Claude Code, just say:
deep work: refactor the authentication module
deep mode: write a technical design doc for the new API
max mode: debug why the payment flow is failing in production
Deep Work activates automatically and orchestrates the full pipeline.
The skill is defined entirely in SKILL.md as a prompt-based orchestration framework. You can:
- Use it directly as a system prompt for any Claude-based agent
- Adapt the phase structure for your own multi-agent pipelines
- Extract individual phases (e.g., just the Quality Gate) for simpler workflows
flowchart TD
A["Phase 0: Deep Analysis\nπ§ ultrathink"] --> B{"Complexity?"}
B -->|LIGHT| D
B -->|MEDIUM| C["Phase 1: Parallel Research\nπ 2-4 Explore agents"]
B -->|HEAVY| C
B -->|WRITING| C
C -->|HEAVY only| P["Phase 2: Architect\nπ Plan agent"]
C -->|MEDIUM/WRITING| D
P --> D["Phase 3: Parallel Execution\nβ‘ 2-5 Worker agents (opus)"]
D --> E["Phase 4: Synthesis\nπ ultrathink merge"]
E --> F["Phase 5: Quality Gate\nπ Fresh-eyes Critic"]
F -->|PASS| G["Phase 6: Deliver β
"]
F -->|NEEDS_FIX| H["Fix Issues"]
H --> F
H -->|"Max 2 iterations"| G
style A fill:#6C5CE7,color:#fff
style C fill:#00B894,color:#fff
style D fill:#0984E3,color:#fff
style E fill:#6C5CE7,color:#fff
style F fill:#E17055,color:#fff
style G fill:#00B894,color:#fff
Not every task needs 10 agents. Deep Work classifies and adapts:
| Complexity | Signals | Total Agents | Phases Used |
|---|---|---|---|
| LIGHT | Single file, clear fix, < 3 steps | 2 | 0 β 3 β 5 β 6 |
| MEDIUM | Multi-file, some ambiguity, 3-7 steps | 5-6 | 0 β 1 β 3 β 4 β 5 β 6 |
| HEAVY | Architecture change, multi-system, 8+ steps | 8-10 | All 7 phases |
| WRITING | Documents, reports, manuscripts | 5-7 | 0 β 1 β 3 β 4 β 5 β 6 |
| Phase | What Happens | Model | Concurrency |
|---|---|---|---|
| 0. Deep Analysis | Classify task, assess complexity, decompose into subtasks | ultrathink | β |
| 1. Parallel Research | Gather context: code patterns, dependencies, test coverage, prior work | haiku | All agents parallel |
| 2. Architect | Design execution strategy, dependency map, risk assessment | opus | Single |
| 3. Parallel Execution | Workers execute subtasks with deep reasoning | opus + ultrathink | All independent workers parallel |
| 4. Synthesis | Merge outputs, resolve conflicts, fill gaps, verify completeness | ultrathink | β |
| 5. Quality Gate | Fresh-eyes critic reviews against original task | opus + ultrathink | Single |
| 6. Fix + Deliver | Address issues (max 2 iterations), deliver with summary | β | β |
| Single Agent | CrewAI | AutoGen | LangGraph | Deep Work | |
|---|---|---|---|---|---|
| Adaptive complexity | β | Fixed roles | Fixed roles | Manual graph | 4-tier auto-scaling |
| Quality gate | β | β | β | β | Mandatory fresh-eyes critic |
| Deep reasoning | Optional | β | β | β | Enforced at every phase |
| Zero config | Yes | Python setup | Python setup | Python setup | Copy 1 file, done |
| Parallel execution | β | Sequential | Round-robin | Configurable | Auto-parallel independent tasks |
| Fix iteration loop | β | β | β | Manual | Built-in (max 2 rounds) |
| Task decomposition | Manual | Manual | Manual | Manual | Automatic in Phase 0 |
| Works without code | N/A | No | No | No | Prompt-only, no dependencies |
| Error recovery | β | β | Retry | Configurable | Retry β absorb β downgrade |
Key differentiator: Deep Work is a prompt-only skill β no Python packages, no API wrappers, no infrastructure. It works with any Claude-based agent by defining the orchestration protocol in a single markdown file.
Worker 1: Implement core logic (main module) ββ
Worker 2: Write tests (test file) ββ All parallel
Worker 3: Update configuration/types (if needed) ββ
Worker 1: Refactor module A ββ
Worker 2: Refactor module B ββ€ββ Parallel
β
Worker 3: Update imports ββββ Sequential (needs 1-2 output)
Worker 1: Write sections 1-3 ββ
Worker 2: Write sections 4-6 ββ€ββ Parallel
β
Worker 3: Fact-check all claims ββββ Sequential (needs draft)
Worker 1: Reproduce + isolate
β
Worker 2: Root cause analysis
β
Worker 3: Implement + test fix
Worker 1: Analyze data source A ββ
Worker 2: Analyze data source B ββ€ββ Parallel
β
Worker 3: Cross-source synthesis ββββ Sequential
The Phase 5 critic has fresh eyes β it never saw intermediate work. It only sees the original task and the final output. This catches category errors that insiders miss.
{
"verdict": "PASS | NEEDS_FIX",
"confidence": "HIGH | MEDIUM | LOW",
"strengths": ["what's done well"],
"issues": [
{
"severity": "CRITICAL | MAJOR | MINOR",
"location": "where in the output",
"issue": "what's wrong",
"fix": "specific fix recommendation"
}
],
"missing": ["anything the output should include but doesn't"],
"overall_assessment": "1-2 sentence summary"
}Universal checks
- Does the output fully address the original task? (completeness)
- Are there any logical errors or inconsistencies? (correctness)
- Is the quality at staff-engineer / senior-researcher level? (quality)
- Would a domain expert find issues? (expertise check)
- Is there unnecessary complexity that should be simplified? (elegance)
Code-specific checks
- Edge cases not handled?
- Potential bugs, race conditions, or security issues?
- Follows codebase's existing patterns and conventions?
- Tests comprehensive? Cover failure cases?
- Will this break existing functionality?
Writing-specific checks
- All claims supported by evidence?
- Argumentation logically sound?
- Gaps in coverage?
- Tone and style appropriate and consistent?
- Citations/references correct and complete?
| Scenario | Deep Work Response |
|---|---|
| Agent fails or returns garbage | Retry once with simplified prompt; if still fails, absorb into main context |
| Workers produce conflicting outputs | Ultrathink evaluates both, picks the better approach, documents decision |
| Critic finds CRITICAL issues after 2 rounds | Deliver with transparent disclosure β never hide problems |
| Task simpler than classified | Downgrade mid-execution, kill unnecessary agents |
| Task harder than classified | Upgrade, spawn additional agents as needed |
Why ultrathink everywhere?
Deep thinking at Phase 0 prevents wasted agent spawns. At Phase 4 it catches merge conflicts. At Phase 5 it catches what workers missed. The cost of extended thinking is negligible compared to delivering wrong output.
Why a fresh-eyes critic?
The orchestrator and workers develop "tunnel vision." A critic that only sees the original task and final output catches category errors that insiders miss.
Why max 2 fix iterations?
Diminishing returns. If the critic still finds critical issues after 2 rounds, the problem is likely structural and needs human judgment, not more agent cycles.
Why adaptive complexity?
Spawning 5 agents for a typo fix is waste. Using 1 agent for an architecture migration is negligence. The complexity classifier ensures the right resources for the right task.
Why prompt-only (no code)?
Zero dependencies = zero friction. Works with any Claude-based agent. No package manager, no API wrappers, no infrastructure. Copy one file, done.
Message 1: Worker (opus, ultrathink)
Message 2: Critic (opus, ultrathink)
Message 1: 2 Explore agents ββββ parallel
Message 2: 2-3 Workers βββββββββ parallel
Message 3: 1 Critic
Message 1: 3-4 Explore agents ββ parallel
Message 2: 1 Architect
Message 3: 3-5 Workers βββββββββ parallel
Message 4: 1 Critic
Message 1: 2 Explore agents ββββ parallel
Message 2: 2 Writers βββββββββββ parallel
Message 3: 1 Fact-checker
Message 4: 1 Critic
- Claude Code β native skill integration
- Any Claude-based agent β use SKILL.md as a system prompt
- Custom pipelines β extract and adapt individual phases
- Other AI frameworks β the orchestration pattern is model-agnostic
Contributions welcome! Ideas:
- Additional task type patterns (DESIGN, DEVOPS, MIGRATION)
- Benchmarks: Deep Work vs. single-agent completion rates
- Integration templates for specific frameworks
- Phase customization hooks
- Translations (Chinese, Korean, Japanese)
A prompt-only multi-agent orchestration framework. No dependencies. No infrastructure. Just intelligence.