Skip to content

feat: canary deployment support for prompt and model configuration changes #810

@chernistry

Description

@chernistry

Summary

Best practice for production agent systems: canary/progressive rollouts for prompt versions, model changes, and configuration updates. Bernstein applies changes globally with no gradual rollout or automatic rollback mechanism.

Why this matters

  • A bad prompt change can tank task completion rates across all agents
  • Model upgrades can introduce regressions that aren't caught by unit tests
  • Without canary deployment, you discover problems after they've affected all tasks
  • LangSmith and Bedrock AgentCore both support instant rollback for this reason

Current state

  • Configuration in .sdd/config.yaml applies globally and immediately
  • Role templates in templates/roles/ are loaded on each spawn — changes apply to all subsequent agents
  • No concept of "prompt version" or "model configuration version"
  • No automatic rollback on quality regression

Implementation guide

Step 1: Version prompt configurations

Create src/bernstein/core/config/prompt_versions.py:

  • Hash each prompt template to create a version ID
  • Store version history in .sdd/config/prompt_versions.json
  • Track which version each task was executed with

Step 2: Canary allocation

In src/bernstein/core/routing/router_core.py:

  • Add canary_percentage config (default: 0, meaning disabled)
  • When > 0, route that percentage of tasks to the "canary" prompt/model version
  • Rest use the "stable" version

Step 3: Quality comparison

In src/bernstein/core/quality/:

  • Compare quality gate pass rates between canary and stable
  • After N canary tasks (configurable, default: 10), evaluate:
    • If canary pass rate >= stable: promote canary to stable
    • If canary pass rate < stable by >10%: rollback canary, alert

Step 4: CLI integration

  • bernstein config canary --prompt-version <hash> --percentage 20 to start canary
  • bernstein config canary --promote to promote canary
  • bernstein config canary --rollback to revert

Testing

  • Test canary routing allocates correct percentage
  • Test automatic promotion on quality match
  • Test automatic rollback on quality regression
  • Run: uv run pytest tests/unit/test_canary.py -x -q

Acceptance criteria

  • Prompt/model configs are versioned with hashes
  • Canary percentage routes a subset of tasks to new version
  • Quality comparison between canary and stable is automated
  • Auto-promotion and auto-rollback work based on quality thresholds
  • CLI commands for managing canary state

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions