AI gateway for GaryOS and autonomous agent stacks.
One OpenAI/Anthropic-compatible endpoint. Multi-provider fallback. Sensitivity-aware routing. Stage-policy combos. Langfuse auto-tagging.
Derived from OmniRoute, diverged at commit
1ee3a95ed7ffe991a24889ec8d47667667f3e7dc. Hard fork — no upstream remote.
Graze sits as a sidecar in the GaryOS Docker Compose stack. Every Claude Code worker subprocess spawned by the GaryOS pipeline daemon points ANTHROPIC_BASE_URL at Graze. Graze handles provider routing, fallback, sensitivity enforcement, and observability — transparently, without changes to agent files or pipeline code.
GaryOS docker-compose
├── gary-daemon ANTHROPIC_BASE_URL → http://graze:20128
├── gary-ui
└── graze ← this project, port 20128
For the full design rationale, workstream plan, and locked decisions, see docs/GRAZE.md.
Graze is built on Graze's routing core. These are the additions specific to autonomous agent deployments:
| Addition | Status |
|---|---|
Header contract (X-Gary-Action-Id, X-Gary-Stage, X-Gary-Playbook, X-Gary-Sensitivity) |
Planned |
Provider metadata (trains_on_data, data_residency, retention_days, local, e2ee) |
Planned |
| Sensitivity routing (tier-1/2/3 provider filtering) | Planned |
| Langfuse auto-tagging (per-action, per-stage trace metadata) | Planned |
Gary combo presets (gary-scout, gary-adversarial, etc.) |
Planned |
# In GaryOS docker-compose stack — see pipeline/docker-compose.yml in garyos
docker compose up graze
# Standalone
cp .env.example .env
# Set JWT_SECRET, API_KEY_SECRET, INITIAL_PASSWORD
docker compose --profile base up -dDefault port: 20128. Dashboard at http://localhost:20128. API at http://localhost:20128/v1.
npm install
npm run dev # http://localhost:20128
npm run test:unit
npm run test:coverage # 60% minimum
npm run check # lint + testSee CLAUDE.md for the full developer guide including adding providers, routes, DB modules, MCP tools, and A2A skills.
Graze inherits the full Graze feature set:
- OpenAI-compatible
/v1/*endpoint (chat completions, embeddings, images, audio, video, reranking, moderation, web search) - 160+ provider support via 16 executors
- 13 routing strategies (priority, weighted, round-robin, cost-optimized, context-relay, and more)
- Prompt compression (15–75% token reduction)
- 3-level proxy for geo-blocked regions
- MCP server (29 tools, 3 transports, 10 scopes)
- A2A Protocol (JSON-RPC 2.0 agent-to-agent)
- Memory and skills systems
- SQLite persistence (WAL, migrations, backups)
- Electron desktop app
- 4,600+ tests, 60% coverage gate
MIT. See LICENSE.