An orchestration system that leverages Codex, Claude, Copilot, and Gemini CLIs for collaborative skills development. Supports multi-agent workflows with configurable agent assignments and A/B comparison testing.
# Activate the virtual environment
source .venv/bin/activate
# Check agent status
mab status
# Run a task with all agents in parallel
mab run "Write a Python validator function" --mode parallel
# List available workflows
mab workflows
# Compare agent configurations on a workflow
mab compare skill_development --skill "my_skill" --desc "Skill description"All CLIs are pre-authenticated and ready to use:
| Agent | Command | Path | Strengths |
|---|---|---|---|
| Codex | codex exec <prompt> |
/usr/local/bin/codex |
Code generation, refactoring |
| Claude | claude -p <prompt> |
~/.local/bin/claude |
Complex reasoning, architecture |
| Copilot | copilot -p <prompt> |
/usr/local/bin/copilot |
Git workflows, quick suggestions |
| Gemini | gemini -p <prompt> |
/usr/local/bin/gemini |
Research, multi-modal |
multi-agent-building/
├── orchestrator/ # Core Python package
│ ├── adapters.py # Async CLI adapters (tested configs)
│ ├── cli.py # Typer CLI (mab command)
│ ├── core.py # Orchestrator with routing logic
│ ├── models.py # Pydantic models
│ ├── workflows.py # Workflow definitions
│ └── runner.py # Workflow executor & comparison
├── scripts/
│ ├── discover_clis.py # CLI discovery & validation
│ └── test_cli_configs.py # Agent configuration testing
├── skills/
│ ├── templates/ # Antigravity SKILL.md templates
│ └── developed/ # Generated skills output
├── config/
│ ├── agents.yaml # Agent capabilities & routing
│ ├── cli_configs.json # Tested CLI configurations
│ └── cli_discovery.json # Discovered CLI paths
├── results/ # Workflow comparison results
└── pyproject.toml
Develop a new skill with research → design → implement → test → review.
Configurations:
claude_heavy— Claude for complex tasks, Gemini for researchcodex_heavy— Codex for code tasks, Claude for design/reviewdistributed— Each agent does what it's best atcopilot_heavy— Copilot-centric workflow
Multi-agent code review: security → quality → refactoring suggestions.
Configurations:
security_focused— Claude for security analysisbalanced— Distributed review responsibilities
# Agent management
mab status # Show all agents and their status
# Task execution
mab run <prompt> # Execute a task
--type <type> # Task type (code_generation, architecture, etc.)
--mode <mode> # single, parallel, sequential, consensus
# Skill development
mab develop <name> # Start developing a new skill
mab skills # List developed skills
# Workflow comparison
mab workflows # List available workflows
mab compare <workflow> # Run workflow with multiple agent configs
--skill <name> # Skill name for context
--desc <description> # Skill description
-c <config> # Specific configs to test (repeatable)Skills are generated in Antigravity-compatible format:
---
name: skill-name
description: What the skill does and when to use it
---
# Skill Title
Instructions in Markdown...
## When to Use
- Trigger conditions
## Instructions
Step-by-step guidance
## Examples
Code examples| Mode | Description |
|---|---|
| single | One agent handles the entire task |
| parallel | Multiple agents work simultaneously, outputs merged |
| sequential | Chain of agents, each building on previous output |
| consensus | Agents vote/agree on best approach |
Determined via scripts/test_cli_configs.py:
# Codex: Non-interactive code execution
codex exec "<prompt>" --skip-git-repo-check
# Claude: Print mode for scripting
claude -p "<prompt>" --output-format text
# Copilot: Silent mode with auto-approval
copilot -p "<prompt>" --allow-all-tools -s
# Gemini: Yolo mode for auto-approve
gemini -p "<prompt>" -y --output-format textWorkflow comparisons are saved to results/ as JSON with:
- Per-step timing and agent used
- Success/failure status
- Output previews
- Summary metrics (fastest, most reliable)
MIT