Summary
Best practice for production agent systems: canary/progressive rollouts for prompt versions, model changes, and configuration updates. Bernstein applies changes globally with no gradual rollout or automatic rollback mechanism.
Why this matters
- A bad prompt change can tank task completion rates across all agents
- Model upgrades can introduce regressions that aren't caught by unit tests
- Without canary deployment, you discover problems after they've affected all tasks
- LangSmith and Bedrock AgentCore both support instant rollback for this reason
Current state
- Configuration in
.sdd/config.yaml applies globally and immediately
- Role templates in
templates/roles/ are loaded on each spawn — changes apply to all subsequent agents
- No concept of "prompt version" or "model configuration version"
- No automatic rollback on quality regression
Implementation guide
Step 1: Version prompt configurations
Create src/bernstein/core/config/prompt_versions.py:
- Hash each prompt template to create a version ID
- Store version history in
.sdd/config/prompt_versions.json
- Track which version each task was executed with
Step 2: Canary allocation
In src/bernstein/core/routing/router_core.py:
- Add
canary_percentage config (default: 0, meaning disabled)
- When > 0, route that percentage of tasks to the "canary" prompt/model version
- Rest use the "stable" version
Step 3: Quality comparison
In src/bernstein/core/quality/:
- Compare quality gate pass rates between canary and stable
- After N canary tasks (configurable, default: 10), evaluate:
- If canary pass rate >= stable: promote canary to stable
- If canary pass rate < stable by >10%: rollback canary, alert
Step 4: CLI integration
bernstein config canary --prompt-version <hash> --percentage 20 to start canary
bernstein config canary --promote to promote canary
bernstein config canary --rollback to revert
Testing
- Test canary routing allocates correct percentage
- Test automatic promotion on quality match
- Test automatic rollback on quality regression
- Run:
uv run pytest tests/unit/test_canary.py -x -q
Acceptance criteria
Summary
Best practice for production agent systems: canary/progressive rollouts for prompt versions, model changes, and configuration updates. Bernstein applies changes globally with no gradual rollout or automatic rollback mechanism.
Why this matters
Current state
.sdd/config.yamlapplies globally and immediatelytemplates/roles/are loaded on each spawn — changes apply to all subsequent agentsImplementation guide
Step 1: Version prompt configurations
Create
src/bernstein/core/config/prompt_versions.py:.sdd/config/prompt_versions.jsonStep 2: Canary allocation
In
src/bernstein/core/routing/router_core.py:canary_percentageconfig (default: 0, meaning disabled)Step 3: Quality comparison
In
src/bernstein/core/quality/:Step 4: CLI integration
bernstein config canary --prompt-version <hash> --percentage 20to start canarybernstein config canary --promoteto promote canarybernstein config canary --rollbackto revertTesting
uv run pytest tests/unit/test_canary.py -x -qAcceptance criteria