Skip to content

docs: ADR-0019/0020/0022 + promote ADR-0018 (routing evolution drafts)#310

Merged
Destynova2 merged 1 commit intomainfrom
docs/adr-routing-evolution-0019-0022
Apr 28, 2026
Merged

docs: ADR-0019/0020/0022 + promote ADR-0018 (routing evolution drafts)#310
Destynova2 merged 1 commit intomainfrom
docs/adr-routing-evolution-0019-0022

Conversation

@Destynova2
Copy link
Copy Markdown
Contributor

Summary

Drafts 3 proposed ADRs (0019, 0020, 0022) covering routing primitives in ADR-0018's "nature-inspired routing" parent, AND promotes ADR-0018 from proposedaccepted to ratify the parent direction before its child ADRs land. No code ships; each child ADR is status: proposed and gated behind a separate accepted-status promotion PR before any implementation.

Why now (re-evaluation drove this)

Two concrete user audiences emerged that change the cost/benefit math vs the original solo-dev framing:

  • Time-sensitive trading bots: fast failover and tail-latency reduction are operational requirements, not nice-to-haves.
  • Security-prevails customers (defense, banks, OIV): declarative auditable policies replace implicit priority-chain semantics.

For these audiences, the original "skip everything" assessment is wrong. The three landed ADRs are now operationally valuable.

What lands

ADR-0018 promotion (proposed → accepted)

A 2026-04-28 promotion note documents the trigger (the two audiences above). This unblocks 0019/0020/0022 from being "child ADRs of a zombie parent."

ADR-0019 EMA stigmergy (proposed)

  • Opt-in via [router] adaptive_scoring = "ema". Default off.
  • Transparent to clients (response shape unchanged).
  • Decay half-life returns scores to neutral; recovery is automatic.
  • Full Prometheus telemetry + Grafana dashboard template ships.
  • Trading defaults: alpha=0.4, decay=15m, skip<0.5.
  • Security-prevails defaults: skip<0.6, X-Grob-Routing response header mandatory.

ADR-0020 Hedged requests (proposed)

  • Opt-in per slot ([models.<slot>.hedge] enabled). Default off.
  • Safety guards: only_at_temperature_zero, skip_if_tools_present, max_concurrent_hedges_per_session.
  • New trust_zone_isolation flag: for security audiences, hedge target must share the same trust_zone tag as the primary endpoint (zones declared per-endpoint in 0022).
  • Trading defaults: enabled on default-model + search-model, hedge_after_ms=1000, target=least_loaded.

ADR-0022 [[endpoints]] + [[policies]] schema rebuild (proposed)

  • Hard cut-over (no backward compat) per founder's "j'ai pas d'utilisateur" decision.
  • BUT with two strong mitigations for security audiences:
    • Migration tool ships one minor release ahead so security teams can validate in staging.
    • Migration tool has --verify mode that runs both schemas against a deterministic test corpus and blocks the migration if any routing decision differs byte-for-byte.
  • trust_zone becomes a first-class endpoint field.
  • Compliance lint mode (grob policy validate --strict) rejects policies lacking explicit trust_zone filters.
  • Audit-trail continuity guaranteed via routing_signal_mapping.toml artifact emitted by the migration tool (allows audit-log post-processors to reconcile pre-cut and post-cut traffic).

What did NOT land

ADR-0021 Thompson sampling — explicitly rejected before drafting. Probabilistic 5%-exploration is incompatible with both:

  • Trading audit trails (regulatory: every routing decision must be deterministic and explainable in writing).
  • Security-prevails predictability requirements (compliance teams reject "we explore at 5%").

The rejection is documented in CHANGELOG and in the ADR-0018 promotion note.

Honest critique baked into the drafts

Each ADR has explicit "Negative Consequences" and "Open Questions" sections that flag known operational risks (hedge cancellation token-burn on some upstreams, Thompson cold-start, schema rebuild cost-without-users-yet). The drafts are deliberate about what they DON'T solve.

Test plan

  • No hardcoded version strings (CI guard).
  • All four ADR files parse with the same frontmatter format as ADR-0018.
  • CHANGELOG entry under [Unreleased] §"Routing roadmap" matches the landed files.
  • ADR-0021 file is removed (no orphaned references).
  • Docs lint passes in CI (markdownlint + lychee).

Auto-merge

Auto-merge intentionally NOT enabled. These are design docs; the founder reviews each before merge.

🤖 Generated with Claude Code

@Destynova2 Destynova2 force-pushed the docs/adr-routing-evolution-0019-0022 branch 3 times, most recently from a88adf2 to 3dab9e2 Compare April 28, 2026 07:35
Drafts three proposed ADRs covering routing primitives in ADR-0018's
"nature-inspired routing" parent, AND promotes ADR-0018 from
status:proposed → status:accepted to ratify the parent direction
before its child ADRs land. None ship code in this PR; each child
ADR is status:proposed and gated behind a separate accepted-status
promotion PR before any implementation.

Re-evaluation against two concrete user audiences drove this:

- Time-sensitive trading bots: fast failover and tail-latency
  reduction are operational requirements, not nice-to-haves.
- Security-prevails customers (defense, banks, OIV): declarative
  auditable policies replace implicit priority-chain semantics.

Ratified parent (ADR-0018):
- Promoted to status:accepted with a 2026-04-28 promotion note.

Three child ADRs landed as status:proposed:

ADR-0019 EMA stigmergy
- Opt-in via [router] adaptive_scoring = "ema". Default off.
- Transparent to clients (response shape unchanged).
- Recovery via decay half-life back to declared priorities.
- Full Prometheus telemetry + Grafana dashboard template.
- Audience defaults: trading (alpha=0.4, decay=15m, skip<0.5);
  security-prevails (skip<0.6, X-Grob-Routing header mandatory).

ADR-0020 Hedged requests
- Opt-in per slot ([models.<slot>.hedge] enabled). Default off.
- Hard threshold + safety guards (only_at_temperature_zero,
  skip_if_tools_present, max_concurrent_hedges_per_session).
- New trust_zone_isolation flag for security audiences (hedge
  target must share trust zone with primary).
- Trading defaults: enabled on default+search slots,
  hedge_after_ms=1000, target=least_loaded.

ADR-0022 [[endpoints]] + [[policies]] schema rebuild
- Hard cut-over (no backward compat) per founder's "no production
  users" decision, BUT migration tool ships one minor release
  ahead AND the tool runs --verify mode for byte-identical routing
  decisions on a deterministic test corpus.
- trust_zone becomes a first-class endpoint field.
- Compliance lint mode (grob policy validate --strict) rejects
  policies lacking explicit trust_zone filters.
- Audit-trail continuity guaranteed via routing_signal_mapping.toml
  artifact emitted by the migration tool.

ADR-0021 (Thompson sampling) explicitly rejected before drafting:
probabilistic exploration is incompatible with both audience demands
for predictable, auditable routing. Documented in CHANGELOG +
ADR-0018 promotion note.

CHANGELOG entry under [Unreleased] §"Routing roadmap" lists the
3 proposed ADRs + 1 promoted parent. None shipping in 0.36.x.
@Destynova2 Destynova2 force-pushed the docs/adr-routing-evolution-0019-0022 branch from 3dab9e2 to cd61627 Compare April 28, 2026 07:38
@Destynova2 Destynova2 merged commit 34d4392 into main Apr 28, 2026
31 checks passed
@Destynova2 Destynova2 deleted the docs/adr-routing-evolution-0019-0022 branch April 28, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant