Skip to content

Add product public-ingress synthetic monitoring and alerts #929

@shiny-code-bot

Description

@shiny-code-bot

Finish Line

Launchplane continuously monitors product public ingress paths from outside the runtime and alerts when customer-facing URLs stop reaching the expected deployed runtime, so issues like stale DNS, redirect loops, broken temporary ingress, or wrong revision are detected before customers report them.

Current Status

State: Split into implementation phases.
Next action: Build first-class public-ingress incident records and reconciliation,
then add DB-backed notification policy and delivery drivers. Keep the scheduled
monitoring/observation path from #975/#977/#978/#980; replace the legacy
alert_issue_url comment model instead of validating it further.
Blocked by: None.
Waiting for: Implementation of the incident lifecycle and policy-backed delivery
sub-issues.
Last verified: 2026-05-29. Scheduled Public Ingress Monitor runs succeeded at
2026-05-29T17:29:44Z and 2026-05-29T18:38:57Z on main SHA 8ce913c.

Problem

SellYourOutboard production was healthy at the app/runtime layer, but its temporary public ingress route depended on a local nginx path whose public IP changed. Because the route was temporary, DDNS was not configured. Launchplane deployment health did not catch this because the container and expected runtime identity were still healthy behind Dokploy/Tailscale.

This exposed a gap: Launchplane tracks product deployments and runtime identity, but it does not yet perform synthetic checks against the same public URLs customers use.

Proposed Scope

Add a Launchplane-owned public watcher for product environments, starting with generic-web stable lanes.

Initial checks should include:

  • Fetch each product environment public base URL and health URL from an external/public path.
  • Detect timeouts, TLS failures, HTTP 5xx/4xx, redirect loops, and self-redirects.
  • Verify /api/health returns 200 when configured.
  • Compare the health revision/runtime identity against Launchplane's expected deployment artifact when available.
  • Record observation history in Launchplane so product environment views distinguish app health from public ingress health.
  • Alert only on state transitions or sustained failures to avoid noisy repeats.

Acceptance Criteria

  • A product environment can declare public synthetic check URLs and expected runtime identity behavior.
  • A scheduled watcher records pass/fail observations for at least sellyouroutboard prod.
  • Failures classify the likely layer, for example DNS stale, connection timeout, redirect loop, wrong runtime revision, or app health failure.
  • Launchplane product environment reads expose latest synthetic public-ingress status separately from deployment/runtime status.
  • An alert is sent when a previously healthy public URL becomes unhealthy and a recovery notice is sent when it recovers.
  • The first implementation avoids secret exposure in logs, issues, and alerts.

Design Notes

Cloudflare may be able to reroute public traffic automatically through Load Balancing pools and health monitors. That should be evaluated as either:

  • a remediation path the watcher recommends, or
  • a later integration where Launchplane manages Cloudflare pool health/traffic steering for products with multiple ingress candidates.

This issue should start with detection and alerting. Automatic failover should be a separate explicit design decision because products may have temporary, direct, DDNS, Cloudflare-proxied, or Tailscale-backed ingress paths.

Incident Context

During investigation:

  • Dokploy application syo-prod-app was healthy and running the expected SellYourOutboard image.
  • Launchplane prod inventory pointed to the expected artifact/revision.
  • Public DNS/ingress did not consistently reach that runtime.
  • A manually tested CM ingress IP accepted 80/443 but returned a self-redirect loop for both / and /api/health.
  • The durable lesson is that deployment health and public customer reachability are separate signals.

Relationships

Related product repo: cbusillo/sellyouroutboard.
Related operational theme: generic-web public environment readiness and post-deploy verification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestplanDurable planning issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions