Spawn N parallel API workers for any batch task. Auto-optimizes worker count + stagger. Tiered model delegation: orchestrator plans (V4 Pro) → workers execute (V4 Flash). 99.95% API success rate.
hermes skills tap add amanning3390/deepswarm# 1. Define your task
cp task.yaml my_task.yaml
# Edit: prompt_template, worker_model, max_tokens
# 2. Generate seeds
python3 scripts/seed.py --count 1000 --template "Generate {{seed}}" > seeds.jsonl
# 3. Launch (auto-optimizes everything)
export DEEPSEEK_API_KEY=sk-...
python3 scripts/swarm.py --task my_task.yaml --total 1000orchestrator_model: deepseek-v4-pro # Plans (few calls, frontier quality)
worker_model: deepseek-v4-flash # Executes (many calls, cheaper)V4 Pro costs ~3× V4 Flash per token. For batch tasks with thousands of calls, tiered delegation saves 60-70%.
| Call Duration | Workers | Stagger | Success | Throughput |
|---|---|---|---|---|
| <10s | 16 | 1s | 99.9% | ~5,760/hr |
| 10-30s | 12 | 2s | 99.9% | ~1,440/hr |
| 30-60s | 8 | 5s | 99.95% | ~440/hr |
| 60-90s | 6 | 10s | 99.9% | ~240/hr |
Omit workers and stagger in task.yaml — DeepSwarm runs a calibration call and picks optimal values.
Built-in: generation, translation, summarization, classification, custom
For multi-turn tasks (tool calling, conversation loops):
multi_turn: true
max_turns: 20deepswarm/
├── SKILL.md # Hermes skill definition
├── README.md
├── task.yaml # Sample task config
├── architecture.html # Pipeline diagram
├── scripts/
│ ├── seed.py # Seed generator (simple + rich templates)
│ ├── swarm.py # Orchestrator (auto-optimize + launch)
│ ├── worker.py # Task-agnostic batch processor
│ └── filter.py # Quality filter with JSON repair
├── templates/
│ └── prompts.py # v2 prompt templates
└── references/
├── api-rate-limits.md
└── generation-patterns.md
After generation, filter the raw output to remove malformed traces:
python3 scripts/filter.py --input-dir output/ --output clean.jsonl --errors errors.jsonlThe filter applies 3 passes: JSON repair (fixes 17% API error rate), structural validation (think blocks, tag balance), and length thresholds. Typically lifts pass rate from ~28% to ~62%.
Built from the DeepSeek Hermes Reasoning Traces project:
- 19,331 traces · 192K tool calls
- 96 workers · 31K API calls
- 99.95% success rate
- 8 workers + 5s stagger = the magic formula