Skip to content

feat(email): failover ESP provider — secondary on primary reject/error (#35)#93

Merged
mastermanas805 merged 1 commit into
masterfrom
feat/email-failover-provider
Jun 5, 2026
Merged

feat(email): failover ESP provider — secondary on primary reject/error (#35)#93
mastermanas805 merged 1 commit into
masterfrom
feat/email-failover-provider

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

What

Adds a FailoverProvider (worker/internal/email/failover_provider.go) that wraps an ordered chain of EmailProvider (primary first) and retries the same EventEmail through each secondary ESP when the primary errors or hard-rejects. A send is "successful" — and the forwarder claims the ledger row (worker convention 6 claim-after-2xx) — iff any provider in the chain returns success.

Directly hardens the standing P0 (Brevo sender unvalidated → every send rejected): wire EMAIL_PROVIDER_FALLBACK=ses and a Brevo account-level reject falls through to SES on the same payload.

Inert by default (safety contract)

EMAIL_PROVIDER_FALLBACK unset → NewProvider returns the bare single provider, byte-identical to today (no wrapper, no metric). Proven by TestFactory_NoFallback_ReturnsBareProvider.

Decision rule

Any non-success result (Transient / Permanent / SkippedNoTemplate) advances to the next provider; first success wins; all-fail returns the last error. All-skipped (no ESP has a template for the kind) returns the skip verbatim so the forwarder advances silently and does not trip all_failed.

Factory

  • single-provider build factored into buildOne(name, cfg) — primary + every fallback share one switch (no duplicated switch).
  • unknown fallback name → hard boot error; valid name with unset creds → skip-with-warning (fail-open; lets ops stage the flag ahead of the SES_* secret). All fallbacks skip → boots with primary only.

Metric (rule 25)

instant_email_failover_total{outcome=primary_ok|fallback_ok|all_failed} (lazy CounterVec — never emits in inert default). INFO on fallback engage, DEBUG on primary-ok, ERROR on all_failed. Emails masked in all logs.

Infra companion PR (alert + tile + catalog): InstaNode-dev/infra#feat/email-failover-observability.

Tests / gate

  • go test ./internal/email/ -coverPASS, 96.4% (≥95 floor); all new funcs 100%.
  • make gate (exact CI sequence) → green.

Delivery itself can't be verified (Brevo sender-unvalidated P0) — this PR asserts the failover contract + metric via tests; delivery awaits operator Brevo/SES validation.

🤖 Generated with Claude Code

#35)

Add a FailoverProvider that wraps an ordered chain of EmailProvider
(primary first) and retries the same EventEmail through each secondary
ESP when the primary errors or hard-rejects a send. A send is
"successful" (forwarder claims the row, preserving worker convention 6
claim-after-2xx) iff ANY provider in the chain returns success.

This directly hardens the standing P0 (Brevo sender unvalidated → every
send rejected): wire EMAIL_PROVIDER_FALLBACK=ses and a Brevo
account-level reject falls through to SES on the same payload.

Inert by default: EMAIL_PROVIDER_FALLBACK unset → NewProvider returns
the bare single provider, byte-identical to today (no wrapper, no
metric). Proven by TestFactory_NoFallback_ReturnsBareProvider.

- factory: single provider build factored into buildOne(name, cfg) so
  primary + fallbacks share ONE switch (no duplicated provider switch).
  Unknown fallback name → hard boot error; valid name with unset creds
  → skip-with-warning (fail-open, lets ops stage the flag ahead of the
  SES_* secret). If all fallbacks skip, boots with the primary only.
- config: EMAIL_PROVIDER_FALLBACK comma-separated → ordered Fallbacks.
- metric: instant_email_failover_total{outcome=primary_ok|fallback_ok|
  all_failed}. INFO on fallback engage (work performed), DEBUG on
  primary-ok (idle), ERROR on all_failed. Emails masked in all logs.
  Lazy *Vec — absent from /metrics until first emit (inert default).
- tests: in-package failover_provider_test.go (fake recording provider):
  primary-ok-no-fallback-call, primary-fail→fallback-ok (transient/
  permanent/skip), all-fail→last-error, all-skip→no all_failed metric,
  ordering, Name(), empty/nil guards, per-outcome metric increments,
  email-masking on every log path. Factory inert-default + with-fallback
  + unknown-name-errors + missing-creds-skipped. Config parse table.

internal/email per-package coverage 96.4% (≥95 floor); all new funcs
100%. make gate green. Infra alert/tile/catalog ship in a separate
infra PR (rule 25) — referenced in the PR body.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 enabled auto-merge (squash) June 5, 2026 09:39
@mastermanas805 mastermanas805 merged commit 695f026 into master Jun 5, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant